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ABSTRACT 



This dissertation investigates the use of multivariate adaptive regression splines (MARS), 
due to Friedman, for nonlinear regression modeling and analysis of time series systems. 
MARS can be conceptualized as a generalization of recursive partitioning that uses spline 
fitting in lieu of other simple fitting functions. MARS is a computationally intensive method- 
ology that fits a nonparametric regression model in the form of an expansion in product 
spline basis functions of predictor variables chosen during a forward and backward recur- 
sive partitioning strategy. The MARS algorithm produces continuous nonlinear regression 
models for high-dimensional data using a combination of predictor variable interactions and 
partitions of the predictor variable space. 

By letting the predictor variables in the MARS algorithm be lagged values of a time 
series system, one obtains a univariate (ASTAR) or semi-multivariate (SMASTAR) adaptive 
spline threshold autoregressive model for nonlinear autoregressive threshold modeling and 
analysis of time series, thereby extending the threshold autoregression (TAR) time series 
methodology developed by Tong. The models seem weU suited for taking into account 
the complex interactions among multivariate, cross-correlated, lagged predictor variables 
of a time series system. A significant feature of this time series application of MARS is 
its ability to produce models with limit cycles when modeling time series data that exhibit 
periodic behavior. In a physical context, limit cycles represent a stationary state of sustained 
osciUations. 

A difficulty faced during regression modeling is the problem of model selection, i.e., 
choosing the appropriate model dimension and model predictor variables. Currently, a mod- 
ified form of generalized cross validation {GCV*), first suggested by Craven and Wahba, 
is used for model selection within the MARS algorithm. However, one question that im- 
mediately develops is whether GCV* is the ‘best’ criterion for model selection when using 
serially and cross-correlated time series data. Using MSE as a measure of performance, 
simulations show that other model selection criteria, in particular the Schwarz-Rissanen 
{SC) criterion, can improve model selection over GCV*. 
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!• INTRODUCTION 



Most research in, and applications of, time series modeling and analysis has been con- 
cerned with linear models. This is due to the maturity of the theory for linear time series, 
and the numerous studies and statistical packages that exist to facilitate the use of linear 
time series models. However, more frequently than not, nonlinear time dependent systems 
abound that are not adequately handled by linear models. For these systems we need to 
consider general classes of nonlinear models that readily adapt to the precise form of a 
nonlinear system of interest. This dissertation is an investigation of the use of multivariate 
adaptive regression splines for the systematic autoregressive modeling and analysis of non- 
linear univariate and semi-multivariate time series systems. This chapter provides a brief 
introduction to regression modeling and multivariate adaptive regression spline modeling 
(MARS), briefly discusses the application of MARS to time series systems, identifies the 
contributions of this dissertation, and gives an outline of the chapters that foUow. 

A. BACKGROUND 

Regression modeling is a popular statistical approach that serves as a basis for studying 
a system of interest. Regression modeling is used to develop a mathematical model of 
the relationships that exist between the dependent (output) variable and the independent 
(explanatory) variables of the system. Classical methods for developing the functional form 
of the regression model are based on previous knowledge of the system and on considerations 
such as smoothness and continuity of the output variable as a function of the explanatory 
(predictor) variables. 

To provide a framework for a regression modeling methodology let y represent a 
single response variable that depends on a vector of p predictor variables x, where 
X = (xi, .. . ,Xv,...,Xp). Assume there are given N samples of y and x, namely 
and that y is described by the (additive noise) regression model, 

y = /(a:i,...,Xp) + € (1) 
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over some domain D C 3?^, which contains the data. The function /(x) reflects the true 
but unknown relationship between y and x. The random additive error variable e, which 
is assumed to have mean zero and variance reflects the dependence of y on quantities 
other than x. The goal of a regression modeling methodology is to formulate a function 
/(x) that is a reasonable approximation of /(x) over the domain D. 

Both parametric and nonparametric regression modeling methodologies provide use- 
ful methods for developing regression models. H the correct parametric form of /(x) is 
known, then we can use global parametric regression modeling to estimate a finite number 
of unknown coefficients. Draper and Smith (1966) discuss classical parametric regression 
modeling and provide extensive discussion of parametric regression modeling techniques. 

The most frequently used and well-known form of parametric regression modeling is 
ordinary least squares regression, which estimates /(x) from (1) using 

y = /(x) = Hy (2) 

where H is the well known projection or ‘hat’ matrix H = X(X'X)~^X'. Parametric re- 
gression models require less data than nonparametric regression models and their properties 
are rapidly computed. However, the proper use of parametric regression modeling requires 
knowledge of the approximate parametric form of the underlying function /(x), which can 
become increasing difficult as the dimension of the predictor variable space p becomes large. 

In this dissertation the approach is focused towards nonparametric regression modeling 
(see, e.g., Eubank, 1988). It is only assumed that /(x) belongs to a general collection 
of functions and the data is used to determine the final model form and its associated 
coefficients i.e., the form of /(x) is not rigidly specified. The most common nonparametric 
regression methodologies use local parametric (linear smoothing) approximations, or use 
spline smoothing approximations, to estimate the underlying function /(x) (Thisted, 1988). 

One difficulty with applying existing nonparametric regression modeling methodolo- 
gies to problems of dimension greater than two has been called the curse-of-dimensionality 
(Bellman, 1961). The curse-of-dimensionality describes the need for an exponential increase 
in sample size N for a linear increase in p, in order to densely populate higher- dimensional 
predictor variable spaces. In effect, the curse-of-dimensionality limits the practical ap- 
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plication of some nonparametric regression modeling methodologies to problems of low 
dimension. 

Linear smoothing is a form of nonparametric regression that estimates /(x) from (1) 

with 



f{x) = Sy (3) 

where 5 is an n by n matrix. As in ordinary lea^t squares regression (2), the matrix S 
depends only on the X matrix. However, in linear smoothing the S matrix can be some 
nonlinear form of the X matrix (Thisted, 1988). In general, linear smoothers compute 
the estimate of f{x) at x,- using some localized neighborhood of data around x,-. Some 
common linear smoothers include running means, kernel smoothing and running lines (see, 
e.g., Altman, 1987 and Cleveland, 1979). Although a nonparametric regression model using 
linear smoothing is rapidly computed, the estimate of f{x) can be poor at the extremes of 
the predictor variable space due to the endpoint behavior of the linear smoother. In addition, 
the curse-of-dimensionality limits the practical application of some linear smoothers to a 
low-dimensional setting, i.e., p is small. Altman (1987) found that some linear smoothers 
systematically overestimate (undersmooth) or underestimate (oversmooth) the estimate for 
/(x) when serial correlation is present in the data. Serial correlation in the data can even 
plague more sophisticated nonlinear smoothers, such as SUPERSMOOTHER (Friedman 
1984). 

Spline smoothing approximations are a special form of linear smoothing (3) which 
are particularly attractive as nonparametric regression models because they arise as the 
solutions to optimization problems closely related to least squares and maximum likelihood 
(Thisted, 1988). Silverman (1985) views spline smoothing approximations as a span between 
parametric and nonparametric regression methodologies. An excellent survey and discussion 
of splines in statistics is provided in papers by Wegman and Wright (1983) and Silverman 
(1985). 

Roughness penalty methods and regression splines are two forms of spline smoothing. 
Spline smoothing approximations that use roughness penalty methods to estimate f{x) are 
very robust regression modeling methodologies. However, roughness penalty methods are 
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hampered by the curse-of-dimensionality and the large number of coefficients that must 
be computed for large p. Regression splines seek to overcome the difficulties of roughness 
penalty methods but stiU require a methodology for selecting the number and location of 
the spline knots for the regression model. 

Multivariate Adaptive Regression Splines (MARS) (Friedman, 1988) is a new method 
of flexible nonparametric regression spline modeling that appears to be an improvement 
over existing regression modeling methodologies when using moderate sample sizes N and 
predictor spaces with dimension p > 2. In the regression context, MARS can be conceptu- 
alized as a generalization of a recursive partitioning strategy (Morgan and Sonquist, 1963; 
Breiman et al., 1984) that uses regression splines in lieu of other simple fitting functions. 
Given a set of predictor variables, MARS fits a model in the form of an expansion in product 
spline basis functions of predictors chosen during a forward and backward recursive parti- 
tioning strategy. Although MARS is a computationally intensive regression methodology, it 
provides a systematic (automatic) approach to regression model building that can produce 
continuous models for high-dimensional data with multiple partitions and predictor variable 
interactions. 

Although MARS is capable of regression modeling in low- dimensional environments, 
i.e., those for which p < 2, its primary advantages exist in higher-dimensional predic- 
tor spaces where, as discussed above, many regression methodologies are plagued by the 
curse-of-dimensionality. The curse-of-dimensionality cannot be overcome if the data used 
in constructing /(x) exhibits no special structure (Friedman, 1988). However, in general, 
this is not the case. Thus, MARS attempts to overcome the curse-of-dimensionality by 
exploiting the localized low-dimensional structure of the data (where it exists) used in 
constructing /(x). Note that in this dissertation the approach taken to explain and apply 
MARS is geometric in nature; the focus is on the iterative formation of overlapping subre- 
gions in the domain D of the predictor variables. Each subregion of the domain is associated 
with a product spline basis function. MARS approximates the unknown function /(x) using 
the set of product spline basis functions associated with the overlapping subregions of the 
domain. 

What about the use of MARS in a time series setting? By letting the predictor variables 
in the MARS algorithm for the rth value in the time series be its lagged values, i.e.. 
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Xr- 1 , Xr- 2 y ••• » Xr-p, onc obtains an adaptive spline threshold autoregressive (ASTAR) 
time series model. In the multivariate time series setting, i.e., where the predictor variables 
are not only the lagged values of the object time series but also the lagged values of other 
cross-correlated time series, the application of MARS results in a semi-multivariate ASTAR 
(SMASTAR) time series model. Thus the MARS methodology gives a new method for 
nonlinear modeling of univariate and multivariate time series and a systematic way of fitting 
the model to the data. The ASTAR and SMASTAR methodologies extend the threshold 
autoregression (TAR) methodology developed by Tong (1990) and seem well suited for 
taking into account the complex interactions among the univariate or multivariate, lagged 
predictor variables of a time series system. 

A significant feature of this application of MARS is its ability to produce nonlinear 
models with limit cycles when modeling time series data that exhibit periodic behavior. 
In a physical context, limit cycles represent a stationary state of sustained oscillations, a 
satisfying behavior for any model of a time series with periodic behavior. Many time series 
such as the Canadian Lynx data. Wolf sunspot data, and many riverflow data sets exhibit 
‘periodic’ behavior. The Lynx data and Wolf sunspot data are quasi-periodic. However, 
riverflow data is frequently tied to a fixed yearly oscillation that can dominate the structure 
of the time series. 

B. CONTRIBUTIONS OF THIS DISSERTATION 

Much as Yule’s (1927) application of least squares regression to linear time series 
motivated the development of linear autoregressive (AR) modeling, the application of mul- 
tivariate adaptive regression splines (MARS) to time series systems provides a new and 
innovative approach for nonlinear time series modeling. The application of MARS to time 
series systems is a major contribution of this thesis. The systematic (automatic) approach 
for model building provided by MARS gives an interpretable representation for a nonlinear 
time series modeling methodology called adaptive spline threshold autoregression (ASTAR) 
for univariate time series systems and semi-multivariate ASTAR (SMASTAR) for multivari- 
ate time series systems. However, the functional form of an ASTAR or SMASTAR model, 
with the combination of different predictor variables and multiple partitions of the predic- 
tor variable space, makes their straightforward interpretation and analysis difficult. In this 
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regard a graphical and hierarchical representation was developed to permit interpretation 
and analysis of ASTAR and SMASTAR models. 

The ASTAR and SMASTAR methodologies turn out to be a generalization of, and 
extension of, the nonlinear threshold autoregressive (TAR) methodology developed by Tong 
(1990). The development of TAR models in the late seventies provided a basis for the ‘prac- 
tical’ modeling and investigation of nonlinear univariate and multivariate time series systems 
(Tong, 1980). Univariate and semi-multivariate TAR models are general enough to capture 
some non-linear phenomena (such as limit cycles), provide predictive capability, appear to 
improve upon linear models when used to model nonlinear systems, and provide a much 
wider class of time series models than available previously. However, in general, TAR models 
are piecewise, discontinuous, linear autoregressive time series models of disjoint subregions 
in the domain of the predictor variables. Also, the ability of the TAR methodology to iden- 
tify the complex interactions between cross-correlated lagged predictor variables, especially 
in the case of a multivariate time series system, is limited by the threshold selection process. 
In contraist, ASTAR and SMASTAR models provide a more general cl«iss of nonlinear time 
series models that are continuous in the domain of the predictor variables. The systematic 
(automatic) approach for developing ASTAR and SMASTAR models seems well suited for 
taking into account the complex interactions among the univariate and multivariate lagged 
predictor variables of a time series system. When used for prediction, ASTAR and SMAS- 
TAR models are a significant improvement over other existing nonlinear models of the time 
series investigated in this dissertation. 

To facilitate the application of MARS to time series systems, Fortran program sub- 
routines were developed. The input programs permit the user to identify and enter the 
necessary information for initiating the MARS methodology in a time series setting. The 
output programs provide ASTAR and SMASTAR model output in a form that facilitates 
model analysis. In addition, various subroutines written in APL are available to permit 
graphical and statistical analysis of ASTAR and SMASTAR models using programs such 
as IBM’s GRAFSTAT. 

One difficulty that is often faced during regression modeling is the problem of model 
selection i.e., choosing the appropriate model predictor variables and model dimension. 
Friedman (1988) uses a modified form of generalized cross validation (GCF*), first suggested 
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by Craven and Wahba (1979), for model selection within MARS. However, one question 
that immediately develops is whether GCV* is the ‘best’ criterion for model selection when 
using serially and cross- correlated time series data. Other model selection criteria, such 
as Akaike’s Information Criterion {AIC) (Akaike, 1974), have been suggested for model 
development in a standard linear time series setting. Using simulations and mean squared 
error (MSE) as a performance measure, it is shown that other model selection criterion, in 
particular the SC (Schwarz, 1978; Rissanen, 1987) criterion, are an improvement over the 
GCV* criterion when modeling time series with MARS. 

C. OUTLINE OF THIS DISSERTATION 

Chapter II provides an introduction to the recursive partitioning and regression spline 
methodologies that form the foundation for the development of the MARS methodology. 
This is followed by the development of the ASTAR time series model that results when the 
MARS algorithm is applied to univariate time series. Simulations are used to demonstrate 
the ability of ASTAR to detect and model simple linear and nonlinear time series. As 
an example of ASTAR modeling in a more difficult setting, the last section of Chapter 
II considers the widely studied yearly Wolf sunspot numbers, a nonlinear time series with 
periodic behavior. When used for prediction, ASTAR models are a significant improvement 
over other existing nonlinear models of the Wolf sunspot numbers. Chapter III discusses the 
semi-multivariate time series extension of ASTAR (called SMASTAR for semi-multivariate 
adaptive spline threshold autoregression) i.e., the univariate time series to be modeled not 
only has its own lagged variables as predictors, but also the lagged variables of other related 
time series. This approach seems well suited for taking into account the complex interactions 
among multivariate, cross- correlated, lagged predictor variables of a time series system. 
Analysis of an Icelandic river using past riverflow, temperature and precipitation is used as 
an example to demonstrate this extension of MARS. The use of this example is predicated 
on the fact that riverflow time series are very difficult to model because they are frequently 
nonlinear and nonnormal, and also because this Icelandic riverflow was analyzed by Tong 
et al. (1985) using semi-multivariate TAR models. Chapter IV explains the development of 
Fortran subroutines to permit the application of MARS to univariate and semi-multivariate 
time series systems. An example is provided using 12 years of daily sea-surface temperatures. 
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a large univariate time series with periodic behavior. Chapter V is a discussion of the 
problem of model selection within MARS. Using simulations and mean squared error (MSE) 
as a performance measure, it is shown that other model selection criterion, in particular the 
SC (Schwarz, 1978; Rissanen, 1987) criterion, are an improvement over the GCV* criterion 
used in MARS by Friedman when modeling time series. 

D. SUMMARY 

MARS is a new nonparametric modeling methodology, due to Friedman, that utilizes 
low-order regression spline modeling and a modified recursive partitioning strategy to ex- 
ploit the localized low-dimensional behavior of the data used to construct /(®). MARS is 
a computationally intensive regression methodology that selects a regression model using 
exhaustive search procedures. However, MARS provides a systematic (automatic) regres- 
sion methodology for deriving nonlinear threshold models for high-dimensional data that 
are naturally continuous in the domain of the predictor variables and can have multiple 
partitions and predictor variable interactions. 

By letting the predictor variables in MARS be lagged values of a univariate time 
series, one obtains an adaptive spline threshold autoregressive (ASTAR) time series model, 
which is a new, computationally intensive method for the systematic nonlinear modeling 
of a univariate time series system. The MARS methodology is easily extended to the 
semi-multivariate nonlinear modeling of a single object time series in a multivariate times 
series system (SMASTAR); this approach is well suited to take into account the complex 
and possibly nonlinear interactions among cross-correlated, lagged predictor variables of a 
multivariate time series system. Also, simulations suggest other model selection criterion, 
such as the SC (Schwarz, 1978; Rissanen, 1987) criterion, for use within MARS when 
modeling in a time series setting. Fortran programs are available for implementing MARS 
in a time series setting; the drivers for the Fortran programs are given in Appendices A, 
B, and C. 
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II. NONLINEAR MODELING OF UNIVARIATE TIME SERIES 
USING MULTIVARIATE ADAPTIVE REGRESSION SPLINES 

(MARS) 



A. INTRODUCTION 

This chapter introduces MARS, due to Friedman (1988), a new methodology for regres- 
sion analysis which, when applied to nonlinear time series, extends the nonlinear threshold 
autoregression methodology (TAR) developed by Tong (1985). To motivate the develop- 
ment of the MARS procedure. Sections B and C of this chapter briefly review recursive 
partitioning and regression splines. Section D of this chapter is a discussion of Friedman’s 
innovations used to develop MARS. An algorithm for implementing MARS is addressed in 
Section E of this chapter. The application of MARS to univariate time series for develop- 
ing adaptive spline threshold autoregression (ASTAR) models is discussed in Section F of 
this chapter. The final part of Section F is an application of MARS to the Wolf sunspot 
numbers, an often studied univariate time series with periodic behavior. 

The approach taken to explain and apply MARS is geometric in nature, i.e., the 
iterative formation of overlapping subregions in the domain D of the predictor variables. 
Each one of the domain’s subregions, developed using a modification of a forward and 
backward recursive partitioning strategy, is associated with a product spline basis function. 
MARS approximates the unknown function f{x) (in equation 1) using the set of product 
spline basis functions zissociated with the overlapping subregions of the domain. 

A significant feature of ASTAR when modeling time series data with periodic behav- 
ior is its ability to produce continuous models for the regression function with underlying 
sustained oscillations (limit cycles). An initial analysis of the yearly Wolf sunspot numbers 
using ASTAR produced several models with underlying limit cycles. When used for predic- 
tion, ASTAR models are a significant improvement over other existing nonlinear models of 
the Wolf sunspot numbers. 
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B. RECURSIVE PARTITIONING (RP) 

The origin of recursive partitioning regression modeling methodology appears to date 
to the development and use of the AID (Automatic Interaction Detection) program by 
Morgan and Sonquist in the early 1960’s. More recent extensions and contributions were 
made by Breiman et al. (1984). In Subsection 1 recursive partitioning (RP) is explained 
using recursive splitting of established subregions. In Subsection 2 recursive partitioning is 
then recast equivalently as an expansion in a set of basis functions. The latter explanation 
of recursive partitioning may be considered a precursor to MARS. 

1. RP: Recursive Splitting of Established Subregions 

Let the response variable y depend in some unknown way on a vector of p predictor 

variables x = (xi, . . . ,Xp), that is modeled with (1). Assume there are N samples of y and 

X, namely Let be a set of S disjoint subregions of 2? C such that 

5 

D = IJ Rj. Given the subregions recursive partitioning estimates the unknown 

function /(x) at x with 



/(®) = /i(®) for X e Rjy (4) 

where the function fj{x) estimates the true but unknown function /(x) over the Rjth 
subregion of D. In recursive partitioning, /j(x) is frequently taken to be the constant 
function (Morgan and Sonquist, 1963 and Breiman et al., 1984) although linear functions 
have been proposed without much success (Breiman and Meisel, 1976). For the purpose of 
explaining MARS, fj{x) is taken to be a constant function, 

/?(*) = Cj- V X G Rj, (5) 

where each Cj is chosen to minimize the jfth component of the residual-squared-error 
(badness-of-fit). 



BOF[fj(x)] = min iVi ~ (®) 

^ XieRj 

Since the subregions of the domain D are disjoint, each cj will be the sample mean of the 
y.’s whose € Rj. 
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In general, the recursive partitioning model is the result of a 2-step procedure that 
starts with the single subregion Ri = D. The first, or forward, step of the algorithm uses 
recursive splitting of established subregions to iteratively produce a large number of disjoint 
subregions ^ ^ where M is chosen by the user. The second, or backward, 

step of the algorithm reverses the first step and trims an excess (M — S) subregions from 
the model using a criterion that evaluates both the model fit and the number of subregions 
in the model. The goal of the 2-step procedure is to use the data to select a good set of 
subregions together with the constant functions Cj that estimate f{x) over each 

subregion of the domain. 

To facilitate understzmding of the recursive partitioning algorithm we examine 
the forward-step procedure for an example problem using p = 3 predictor variables, and 
M = 5, the maximum number of forward-step subregions. Let v = l,...,p index the 
predictor variables and k = 1, . . . , n, index the ordered sample values of a predictor variable 
x„ in subregion Rj. For our purposes we use BOFm = ^ forward- 

step measure of fit for a recursive partitioning model with m subregions, and we restrict the 
set of candidate partition points to the actual sample values, Note that represents 
the kth serially-ordered sample value of the vth predictor variable, while x„ alone denotes 
the running values of the vth predictor variable. At the start of the forward-step recursive 
partitioning algorithm, Ri is the entire domain D and the single subregion estimate for 
f(x) is 

/(®) = fi{x) = Cl = ^ ^ Vi. (7) 

«=1 

The forward-step measure of fit for the single subregion recursive partitioning model is 

N 

B0Fi = J2(yi-Ci)\ (8) 

1=1 

The initial recursion, m = 2, for the forward-step algorithm selects a partition 
point that best splits subregion Ri into two disjoint sibling subregions. The method for 
discovering t* is a straightforward exhaustive search; evaluate every sample value (for 
V = 1, . . . ,p;A: = 1, . . . , n) as a candidate partition point to determine which one minimizes 
the remaining badness-of-fit for a m = 2 subregion model. For example, let t = X145 identify 
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a candidate partition point for predictor variable ii. The area in parent subregion Ri to 
the left of t, xi < t, resides in proposed sibling subregion Rij. The area to the right of t, 
t < x\, resides in proposed sibling subregion iZi.r. Given the proposed split of R\ along 
t = we evaluate the model using BOFm for a m = 2 subregion model, i.e., 

BOF2 = inin (Vi ~ IZ (y«‘ “ (9) 

Using the indices v and k, the exhaustive search sequentially evaluates all possible partition 
points for each predictor variable in Ri (which here is equal to D). 

For our example problem, let the partition point t" = 12,25 identify the split of 
subregion Ri that minimizes the forward-step fit criterion BOFm for a m = 2 subregion 
recursive partitioning model. We use 12,25 to create two new disjoint subregions during the 
split and elimination of the old parent region, which we now call First, the area in 

parent subregion R\» to the left of F (i.e., 12 < t*) is assigned to sibling subregion R^, while 
the area to the right of t* (i.e., t* < 12) is reconstituted as subregion iii. The creation of the 
two new disjoint subregions R\ and i?2, and the elimination of the old parent subregion Ri>, 
increase by one the number of disjoint subregions that partition D completing the initial 
recursion of the forward-step procedure. Thus, the two-subregion recursive partitioning 
estimate of /(x) for our example problem is 

/(x) = Cj if X G Rj, for j = 1,2 (10) 



where (since we are splitting the domzdn D on only 12’s dimension). 



X e < 



Ri 

R2 



if I2 > X2,25 
if I2 < X2,25* 



Note that the form of the recursive partitioning model (4) did not change during the recur- 
sion, but only the number of disjoint subregions that partition D. 

The recursions m = 3,...,Af = 5 of the forward-step algorithm repeat the first 
recursion with one exception. The exhaustive search is now conducted to identify the 
best split (minimizing BOFm) for one and only one of the subregions from the current 
m — 1 subregion model. Each recursion’s partition point F is selected as before, after 
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an evaluation of all potential partition points for each predictor variable in the existing 
subregions {Rj}^Si of the model. The recursive splitting continues until the domain D is 
partitioned into M = 5 disjoint subregions Upon completion of the forward-step 

recursive partitioning algorithm, a backward-step algorithm trims excess subregions using a 
criterion that evaluates both fit and the number of subregions in the model. (See Friedman 
(1988) for a discussion of the backward-step algorithm). Completion of the backward- 
step procedure results in the final recursive partitioning model with subregions 

{S < M). 

2. RP: An Expansion in a Set of Basis Functions 

While the initial approach to understanding recursive partitioning is through 
recursive splitting, it is recast now in an equivalent form to provide a reference for explaining 
the MARS methodology. The central idea is to formulate the recursive partitioning model 
as an additive model of functions from disjoint subregions. Also, we associate the operation 
of subregion-splitting with the operation of step-function multiplying. The new approach 
approximates the unknown function /(x) at x with an expansion in a set of basis functions 
from disjoint subregions {Rj}f^i^ 

( 11 ) 

j=i 

where 



Bj{x) = I [xe Rj ] , 

and /[•] is an indicator function with value 1 if its argument is true and 0 otherwise. The 
constant function Cj estimates the true, but unknown function, /(x) over the Rjth subregion 
of Dj and Bj{x) is a basis function that indicates membership in the iijth subregion of D. 
We call Bj{x) a basis function because it restricts contributions for /(x) to those values 
of X in the iijth subregion of D. The approximation of the unknown function f{x) at x 
in (4) and (11) are equivalent: the subregions {Rj}j-i are the same disjoint subregions of 
the domain Z?, and the constant functions are the same constant functions that 

estimate /(x) over each subregion. 
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During each search for a partition of a subregion Rj using an expansion in a 
set of basis functions (11)> the selection of a candidate partition point creates a particular 
functional form for /(x) that we call g in the following algorithm. Let 



n[T,] = i 



1 

0 



if »/ > 0 
otherwise. 



( 12 ) 



be a step function (which returns a value of 1 if t) is not negative, and 0 otherwise). Fol- 
lowing Friedman (1988), an algorithm to implement the forward-step recursive partitioning 
procedure using an expansion in a set of basis functions is; 



Recursive Partitioning Algorithm (Forward-Step) (13) 



Ri = D, Bi(x) = 1 






(a) 


For each subregion Rm, m = 2 to M do: 






(b) 


bof* = 00 , = 0, V* = 0, = 0 






(c) 


For each established subregion Rj, j = 1 to 


m - 1 


do: 


(d) 


For each predictor variable Xy in Rj, v = 


1 to p 


do: 


(e) 


For each data value in Rj, i = x„ 


,*=i to 


Xy,k=n do: 


(f) 


9 = i^dti <^dBd(x)) -1- c„,Bj{x)H[t - 


- x„] -1- 


CjRj(x)R[a:v - t] 


(g) 


bof= BOFrr, 






(b) 


if bof < boF then boF = bof ; j* ■ 


• 

II 


= v; F = t end if 


(i) 


end for 








end for 








end for 








Rm <— : (t“ — iv*) > 0} 






(j) 


Rj. {Rj. : (iv. - F) > 0} 






(k) 


end for 









end 

The forward-step recursive partitioning algorithm is initialized with the first sub- 
region Ri equal to the entire domain D (13a). The outer loop (13b) controls the iterative 
creation of the subregions {RTO}m= 2 * Next, the dummy variables (13c) for the evaluation of 
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the fit procedure bor, region j*, predictor variable v*, and partition point t* are initialized 
in preparation for identifying the next partition of an established subregion The 

three nested inner loops (13d-13f) perform the exhaustive search for the next partition point 
by iteratively searching across all established subregions (13d), all predictor variables (13e), 
and all values of the predictor variables in the jth subregion (13f). Given the investigation 
of a partition point t for a predictor variable in subregion Rj, the function g (13g), 
with parameter vector c = (ci, . . . , Cm), is the current candidate for a recursive partitioning 
model estimate of /(x) in the mth iteration of the forward-step procedure. The first term 
in (13g) includes all subregions except subregion Rj. The last two terms in (13g), 

CmBj{x)n[t - X„] + CjBj{x)H[x^ - t], 

reflect the proposal to divide the parent subregion Rj into two disjoint sibling subregions 
using the step functions H[t — Xv] ‘‘•nd n{xy — t\ to identify each x’s location with respect to 
the partition point t. Next, BOF^ (13h) is the forward-step measure of fit that evaluates 
the function g with respect to the data. Information for the best yet discovered partition, 
predictor variable, and subregion is retained (step 13i) as the search continues for the best 
partition of an established subregion in the mth iteration. Completion of the mth 

iteration’s search results in the division (and elimination) of the old parent subregion Rj» 
into two disjoint sibling subregions (13j and 13k) based on x„«’s location with respect to 
the partition point t*. The iterations continue until the domain D is partitioned into M 
disjoint subregions 

Each basis function Bj{x) identifies membership in the Rjih subregion of D and 
is the result of the product of step functions whose partition points define the subregion Rj. 
For example, let D G 3?^ and R$ be a subregion formed from the sequence of step functions 
^[xi — tj], n[t 2 — X2], n[x 2 — <3] and n[tl — xi] where is 0,1, 0,1 respectively. Then 

the basis function Bs{x) is, 

Bs{x) = H[xi - 0] X H[1 - X2] X n[x 2 - 0] X H[1 - xi], (14) 

which delineates the subregion iJs as a unit square in The basis function Bs{x) = 1 if 
0 < xj < 1 and 0 < X2 < 1, and 0 otherwise. 
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In recursive partitioning, the subregions are disjoint. Each data point 

X belongs to only one subregion Rj. Therefore, the estimate of f(x) over each subregion 
is restricted to the functional form for which in this discussion is the constant 

function cj. However, as we will address in Section D, MARS has overlapping subregions. 
The estimate of f{x) over subregion Rj may be obtained as a sum of multiple functional 
forms. 

Recursive partitioning is a very powerful regression modeling methodology that 
is rapidly computed, especially if /j(x) is a constant function cj. Each forward step of 
the algorithm (13) partitions one and only one subregion of the domain on an influential 
variable Xy*. This procedure increasingly localizes the activity of the predictor variables 
with respect to the response variable y. However, there are several drawbacks to using 
recursive partitioning as a regression modeling technique: 

• Recursive partitioning models have disjoint subregions giving rise to discontinuities 
at subregion boundaries. This is disconcerting if we believe /(x) is continuous. 

• Recursive partitioning has an innate inability to adequately estimate functions /(x) 
that are linear or additive. This is due to the recursive division of established subre- 
gions during the forward-step procedure that automatically produces predictor vari- 
able interactions, i.e., terms of the form cx,Xj, unless all successive partitions occur 
on the same predictor variable. 

• The form of the recursive partitioning model (11), which is an additive combination 
of functions of predictor variables in disjoint regions, makes estimation of the true 
form of the unknown function /(x) difficult for large p. 

C. REGRESSION SPLINES 

The development of a regression spline model offers another method for explaining 
MARS. Silverman (1985) views spline functions a^ an attractive approach to modeling that 
may be thought of as a span between parametric and nonparametric regression methodology. 
For simplicity, define a gth-order polynomial function in the unknown x £ D C with 
coefficients c/ as follows: 

Pg{x) = ^c/x^ for X £ D. (15) 

/=o 

The polynomials in (15) are smooth and easy to manipulate. However, global fitting of data 
with a polynomial model may require higher-order terms having unacceptable fluctuations. 
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This observation leads us to divide the domain D into smaller subregions Rj to permit the 
use of (different) polynomial functions of relatively low order within each subregion. 

Let [a, 6] = D C and let As = denote an ordered partition of [a, 6] 

into S disjoint subregions a = to < < • • • < ^5-i < = b. Denote each disjoint 

subregion by Rj = [tj-i,tj], for j = 1, . . . , 5. Let C^[D\ represent the set of all continuous 
functions in D whose g - 1 derivatives are also continuous. Using j as a subscript to index 
the subregions, we define a spline function as a set of 5 piecewise gth-order polynomial 
functions whose function values and first g — 1 derivatives agree at their partition points, 
i.e., 

j=l 

with the restriction that € C^[D]. 

There are several approaches for implementing (Wegman «ind Wright, 1983) splines 
within a regression setting. One approach is to write the regression model (1) as a piecewise 
regression spline model. 



y = 



(17) 



where € is assumed to have mean zero, variance <t^ and to be independent of s^^(®). 
Moreover, 5^^(x) estimates f{x) according to (16). 

Given a set of partitions points As, Smith (1979) has shown that a different and more 
useful regression spline model may be written using plus (+) functions. The plus function 
is defined as 



{ u if « > 0 
0 if u < 0. 



(18) 



Again, let [a, 5] = D C 3?^. However, we now let Aso = {ti, • • • define an ordered 

partition of [a, 6] into S overlapping subregions and denote the S overlapping subregions as 
Rj = [tj-i,ts]j for j = 1,...,5. Let I index the order of the polynomial terms in each 
subregion of the domain and Cji denote the coefficient for the /th term of the polynomial 
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function in the {j + l)st subregion of a spline model. The use of plus functions results in a 
truncated regression spline model functionally equivalent to the piecewise regression spline 
model (16) as follows: 

9 5-1 

y = (19) 

1=0 i=i 

where c is assumed to have mean zero, variance (T^ and is independent of 5^^(i), and q is 
assumed to be greater than or equal to one. Since the partition points of the set Aso are 
ordered, the number of overlapping truncated spline functions with nonzero values increases 
by one as we move to the right, across each partition point tj. Figure 1 compares the different 
forms for {q = 2)-order piecewise (16) [top] and truncated (19) [bottom] spline functions, 
both with a single partition point at x= 1, that equivalently define a line y = /(®). In the 
top plot the line y from 0 < x < 2 is defined by two disjoint 2nd-order polynomial functions 
that are shown using different triangular symbols; one 2nd-order polynomial function shown 
as V V •" in subregion [0,1) and one 2nd-order polynomial function shown as A A • • • in 
subregion [1,2]. In the bottom plot the line y in the first subregion [0,1) is also defined by a 
single 2nd-order polynomial function shown as V V • • However, in the second subregion 
[1,2] the line y is defined as the sum of two overlapping 2nd-order polynomial functions; the 
first a 2nd-order polynomial function overlapping from the first subregion shown as yy • • • 
and the second a truncated 2nd-order polynomial function shown as A A • • *. Both the 
piecewise (16) and truncated (19) spline functions equivalently define the line y. 

The key point of this section is that once the number and the values of the partition 
points {tj}jZi are fixed, the qth-order truncated regression spline model (19) with those 
partition points is a linear model whose coefficients c are determined by straightforward least- 
squares regression. Nevertheless, the major difficulty in implementing a qth-order regression 
spline model is in choosing the number and values of the partition points. 

We have defined regression spline models in 3?^. The extension to higher dimensions 
for p > 1 predictor variables is usually accomplished through products of univariate spline 
functions. Nevertheless, regression using products of univariate spline functions suffers from 
the curse-of-dimensionality discussed previously. From the perspective of regression splines, 
MARS attempts to overcome the curse-of-dimensionality by using a modified recursive 
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Quadratic Regression Spline Functions 
Pimemwi— Splin* wiU) ParilUon Point (X « 1) 




Truncated Splina wiUa Partition Point (X * I ) 




Figure 1. The different forms for a piecewise (16) and truncated (19) spline function using 
(g = 2)-order splines over the region D = [0,2] with a single partition point at x= 1. 
In the top plot, the line y is defined by two different disjoint 2nd-order polynomial 
functions; one 2nd-order polynomial function for the region [0,1) shown as V y • 
and one 2nd-order polynomial function for the region [1,2] shown as A A • • In the 
bottom plot the line y in the region from [0,1) is again defined by the single 2nd- 
order polynomial function 8 -h 7x-hl2x^ shown as V V • • However, in the subregion 
from [1,2], the line y is defined as the sum of two overlapping 2nd order polynomial 
functions; 8 + 7x-f-12x^ that continues from the first subregion and the truncated 
2nd-order polynomial function — 32[(x— l)^.]^ shown as A A • • 
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partitioning strategy to select partitions of the domain. This permits MARS to exploit the 
localized, low-dimensional structure of the data using linear (i.e., 9 = 1 order) truncated, 
multidimensional regression spline functions. 

D. FRIEDMAN’S INNOVATIONS FOR RECURSIVE PARTITIONING 

Recursive partitioning and regression splines have tremendous power for modeling in 
high-dimensional environments. Each approach also presents difficulties when applied; re- 
cursive partitioning has discontinuities, variable interactions and poor model interpretation, 
and regression splines battle the curse-of-dimensionality and lack a methodology to opti- 
mally select its many parameters. 

Two aspects of the recursive partitioning algorithm (13) contribute to the difficulties 
of its application in a high- dimensional setting. The iterative division and elimination of the 
parent region when creating its sibling subregions causes difficulty in estimating linear and 
additive functions. The discontinuous nature of the step function H[t]] (12) when applied 
in each linear regression of the forward-step recursive partitioning algorithm (13g) causes 
the lack of continuity. Together, these characteristics make interpretation of the recursive 
partitioning model difficult at best. 

To overcome recursive partitioning’s difficulty in estimating linear and additive func- 
tions, Friedman (1988) proposes that the parent region is not eliminated (as in recursive 
partitioning) during the creation of its sibling subregions. Thus, in future iterations both 
the parent and its sibling subregions are eligible for further partitioning. An immediate 
result of retaining parent regions is overlapping subregions of the domain. Also, each par- 
ent region may have multiple sets of sibling subregions. With this modification, recursive 
partitioning can produce linear models with the repetitive partitioning of the initial region 
Ri by different predictor variables. Additive models with functions of more than one pre- 
dictor variable can result from successive partitioning using different predictor variables. 
This modification also allows for multiple partitions of the same predictor variable from the 
same parent region. 

The above modified recursive partitioning algorithm in which the parent region is 
maintained results in a class of models with greater flexibility than permitted in recursive 
partitioning. However, the modified approach is still burdened with the discontinuities 
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caused by the step function H[t]]. To alleviate this difficulty, Friedman proposes to replace 
the step function H[t)] in the model formulation step (13g) with linear (i.e., g = 1 order) 
regression splines in the form of left (-) and right (+) truncated splines. Let represent a 
2*tuple associated with the iZ^ntb subregion whose components identify the direction (left or 
right), specific predictor variable, and partition point used to create subregion Rm from its 
parent region. Left and right truncated splines for creating the /J^th and Rjn+ist subregion 
from the parent region Rj with a partition point in the domain of Xy at t are defined as 

Tj,rAx) = [(f - a:v)+]’=^ = {t - x^)+ and 
7j,rm+i(®) ~ ((*»' ~ 0+]^ ^ ~ 0+» (20) 

where = (— v, t ) and r^n+i = ^ > i- The additional subscripts j and m, or j 

and m+ 1, provide a necessary audit trail for products of truncated splines when interactions 
are allowed among multiple predictor variables. Note that the truncated spline functions act 
in only one dimension although their argument is a vector of predictor variables. 

A modeling approach using linear truncated splines (20) creates a continuous approx- 
imating function /(x) with discontinuities in the first partial derivative of /(x) at the 
partition points of each predictor variable in the model. The argument for using linear 
truncated splines (20) is that there is little to be gained in flexibility, and much to lose in 
computational speed by imposing continuity beyond the function /(x). Lineeir truncated 
splines allow rapid updating of the regression model and its coefficients during each exhaus- 
tive search for the next partition of an established subregion. The placement of additional 
partitions may be used to compensate for the loss of flexibility in using linear truncated 
splines to estimate /(x) over a subregion of the domain. 

Implementation of the modifications proposed above to the recursive partitioning al- 
gorithm avoids its identified difficulties and results in the MARS algorithm. The MARS 
algorithm produces a linear (9 = 1) truncated spline model (19) with overlapping subre- 
gions of the domain D. Each overlapping subregion of a MARS model is defined by 

the partition points of the predictor variables from an ordered sequence of linear truncated 
splines. 
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Define the product basis function Km{x) cis the ordered sequence of truncated splines 
cissociated with subregion Rm- The first term of every product bcisis function is To,n (®) = 
the initialization function cissociated with Ri- Each additional truncated spline represents 
the iterative partitioning of a parent region into a sibling subregion. For example, assume 
the sequence of ordered truncated splines for the parent region Rj is (1,3,7), which is split 
using to create subregion Rm. The product basis function Km{x) associated with 

the iZm th subregion for this example is 



Km(x) = To,nix) X X X Tr.rmC®)- (21) 

’■ ■■■■ v» — ■*' 

K7(X) 

where m > 7. 

To evaluate Km{x) at x requires the evaluation of each truncated spline in the product 
basis function at x. If any of the truncated spline evaluations at x are zero, then Km(x) 
at X is 0. Otherwise, the evaluation of Km(x) at x is the product of the truncated splines 
at X. For example, let the ordered truncated splines for iZs € be (1, 2 and 5) with 
rj = (2,3) and rs = (—3,1). The product basis function associated with Rs is 



J^s(x) = To,ri(x) X Ti,rj(®) X T2,rj(®) 
= 1 X (i2 - 3)+ X (1 - xa)+ 




(x2 — 3)(1 — X3) if X2 > 3 and X3 < 1 
0 otherwise. 



If X = {5, 4, .5} e Rs then Ks(x) = .5 and if x = {4, 3.5, 6} ^ iZs, then Ks(x) = 0. 

The level of interaction of the predictor variables associated with Rj is the number 
of truncated splines (without To,ri(®)) in a product basis function Kj{x). A one term 
product basis function represents a truncated linear relationship of its predictor variable 
while a two term product basis function represents a truncated 2-way interaction and so on. 
The number and level of interactions in a MARS model are only limited by the data and 
the maximum level of interaction (an input parameter) permitted in the MARS algorithm. 

The MARS estimate of the unknown function /(x) is 

( 22 ) 

i=i 
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where /(x) is an additive function of the product basis functions associated with 

the subregions Since for a given set of product basis functions the values of the 

partition points, which of course are parameters of the model, are fixed^ the MARS model 
(22) is a linear model whose coefficients may be determined by straightforward least 

squares regression. 

As in recursive partitioning the objective of the forward-step MARS algorithm is to 
iteratively adjust the vector of coefficient values to best fit the data while identifying the 
subregions for M > whose product basis functions approximate /(x) based on 

data at hand. And again, as in the recursive partitioning procedure, it makes sense to follow 
the forward step procedure with a backward-step trimming procedure to remove the excess 
(M — S) subregions from the model whose product basis functions no longer sufficiently 
contribute to the accuracy of the model fit. 

MARS uses residual-squared-error, because of its attractive computational properties, 
in the forward and backward steps of the algorithm to evaluate model fit and compare 
partition points. The actual backward fit criterion that is used for final model selection is a 
modified form of the generalized cross validation criterion (GCV) first proposed by Craven 
and Wahba (1979). The modified generalized cross validation criterion {GCV*) used in a 
MARS model with subregions {Rj}jLi is. 



GCV*{M) 






(23) 



The numerator in GCV* is the average residual-squared-error and the denominator is a 
penalty term that reflects model complexity. The difference in GCV* and GCV is in the 
computation of C(M)*, a model complexity penalty function that is increasing in M. In 
MARS this modification is necessary to account for the heavy use of the data in determining 
both the partition points and the coefficients of a final model. The use of other criteria, 
perhaps more suitable to time series applications, is examined in Chapter V. 



E. FORWARD STEP MARS ALGORITHM 

The MARS forward-step algorithm (24) results from applying the modifications ad- 
dressed in Section D to the forward-step recursive partitioning algorithm (13). Again we 
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initialize Ri = D. However, in MARS we create two new subregions Rm and and 

maintain the parent region Rj> during each partition. Also, MARS restricts each sequence 
of truncated splines from having more than one partition per predictor variable because 
this creates a nonlinear spline function i.e., one with g > 1. MARS enforces this restriction, 
during the search for the next best partition of a subregion Rj, by excluding from consid- 
eration for a partition point any predictor variable already included in the product basis 
function Kj{x). The most notable difference between the RP and MARS algorithms occurs 
in forming the MARS model. Again following Friedman (1988), the product basis functions 
given at (21) and the truncated splines 7j,rm(®) and 7j,rm+i(®) given at (20) 
replace the basis functions {5j(®)}^j and the step functions n\t — i„] and H[xv — f] from 
equation (12) in the forward step recursive partitioning algorithm (13g) respectively. 



MARS Forward Step Algorithm (24) 



Ri = D, 7b,ri(®) = 1 (a) 

For each subregion Rm-, m = 2 to Af do: (b) 

bof* = oo, j* = 0, V* = 0, t" = 0 (c) 

For each established subregion Rj, i = 1 to m — 1 do: (d) 

For each predictor variable in Rj, v = 1 to p such that v ^ Kj{x) do: (e) 

For each data value in Rj, t = iv,fc=i ®v,Jt=n do: (f) 

9 — (H<iQ^(f(®)) d" ^mKj{x)Tj^rmi^) "b ‘'m+i^j(®)^',rm+i(®) (g) 

bof= BOFm (h) 

if bof < bof* then bof* = bof ; j’ = j; = v; t’ = t end if (i) 



end for 
end for 
end for 

Rm <- {iij* : (t“ - Xv») > 0} (j) 

Rm+i ♦— {-Rj* : (xv* — i’) > 0} (k) 

m m + 2 (1) 

end for 
end 



24 



To characterize this MARS forward-step procedure we use the example discussed in 
Subsection 1 of Section B with p = 3 predictor variables, and M = 5 , the maximum 
number of forward-step partitions. The MARS algorithm parallels the recursive partitioning 
algorithm except for the modifications discussed in Section D. At the start of the MARS 
forward-step algorithm for our example problem (step 24 a), the initial subregion is again 
the entire domain i.e., R\ = D. The single subregion MARS estimate of f{x) is restricted 
to be identical to the recursive partitioning estimate, 

1 ^ 

f{x) = ciKi{x) = ciTo,n(®) = Cl = ( 25 ) 

Again, let the exhaustive search in the first iteration of MARS identify the best partition 
of Ri as = 12,25- Continuing, the three subregion MARS estimate of f{x) obtained at 
the second step (first partition at t” = X2,2s) is, with To,ri(®) = 1 > 

fix) = Cl Kl(x) + C 2 K2(x) -I- C 3 K3(x) ( 26 ) 

= Cl To, nix) + C2 To,ri(®) T\,nix) cz To, nix) Ti,nix) 

= C1+C2 it* - X2)+ + C3 (X2 - f*)+, 

if x£D 

if X2 < X2,2S and a; E Ri 
if X2 > X2.2S and ® E Ri. 

In the next iteration of the forward-step MARS algorithm the best partition point 
wiU occur within the subregions Ri,R2 or R3 and as in recursive partitioning, with one 
exception, will be chosen after evaluation of all potential partition points for each predic- 
tor variable within the three subregions. The exception, as discussed previously, prevents 
another partition on X2 in R2 or R3 because it would create a truncated spline function of 
order greater than 1 . With M = 5 the forward step of the MARS algorithm will be com- 
plete after a second partition in D. The final forward step MARS estimate of /(x) for our 

example will include aU terms in ( 26 ) and the additional two terms generated by the second 

partition. The model will have 5 single term product spline functions (excluding To,Vi (*)) 
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if the second partition occurs in Ri while the model will have 3 single term product spline 
functions and two 2- way product spline functions if the second partition occurs in R 2 or R 3 . 

After the backward trimming procedure, the final MARS model retains the form of 
(22) with Cl the coefficient of the product basis function Ki{x) and the remaining terms the 
coefficients and product basis functions that survive the MARS backward step subregion 
deletion strategy. To provide an insight of predictor variable relationships we can rearrange 
the final MARS estimate of f{x) in an ANOVA style decomposition (Friedman, 1988), 

/(*) = c, + cjKj(x) + CiKj(.x) + . . . (27) 

V=1 V=2 

where V indexes the number of truncated splines (excluding To,ri(a:)) in the product basis 
function This method identifies any and all contributions to /(x) by variables 

of interest. Product basis functions with the index V = 1 reflect truncated linear trends 
and those with the index V = 2 reflect truncated 2- way interactions, etc. The ANOVA 
style decomposition (27) identifies which variables enter the model, whether they are purely 
additive, or are involved in interactions with other variables. Analysis of the ANOVA style 
decomposition facilitates interpretation of the MARS model. 

F. NONLINEAR MODELING OF UNIVARIATE TIME SERIES USING 
MARS 

As previously discussed in the introduction, most research in and applications of time 
series modeling and analysis is focused on linear models. This is due to the maturity of the 
theory for linear time series, and the numerous studies and statistical packages that exist to 
facilitate the use of linear time series models. However, more frequently than not, nonlinear 
time dependent systems abound that are not adequately handled by linear models. The 
use of linear models during the analysis of these nonlinear systems may require invalid 
assumptions that could lead to erroneous or misleading conclusions. For these systems we 
need to consider general classes of nonlinear models that readily adapt to the precise form 
of a noidinear system of interest (Priestley, 1988 and Tong, 1985). 

An example of a nonlinear time series system is that of sea-surface temperatures and 
the associated wind velocity and direction. Consider the sea-surface temperatures alone, a 
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specific example is the sea-surface temperatures analyzed by Breaker and Lewis (1985). A 
very clear nonlinearity in this time series is the abrupt, yearly spring transition to lower 
temperatures. The spring transition can be clearly seen in Figure 35 in Chapter IV, espe- 
cially at about 2190 days. More particularly, the spring transition is strongly coupled with 
the wind direction, which shifts in the spring (Breaker and Lewis, 1988, pg 395). In addition 
there is an effect of the El Nino (a tropical warming) that occurs during some years. We 
return to this example in Chapter IV. 

By letting the predictor variables for the rth value in a time series {Xt} be Xr-i, 
Xt- 2 , ..., Xr-p, and combining these predictor variables into a linear additive function, 
one gets the well known linear AR(p) time series models (Priestley, 1988). What happens 
if we use the MARS methodology to model the effect of • • • yXr-p on Xr1 The 

answer is that we stiU obtain autoregressive models. However, these models can be nonlinear 
models in the sense that the lagged predictor variables can have threshold terms, in the form 
of truncated spline functions (20) and can also interact with the nonlinear terms formed 
with other lagged predictor variables. The remainder of this chapter is a discussion of the 
form and analysis of these nonlinear univariate time series models. 

Threshold time series models (models with partition points) are a class of nonlinear 
models that emerge naturally as a result of changing physical behavior. Within the domain 
of the predictor variables, different model forms are necessary to capture changes to the 
relationship between the predictor and response variables (a simple example of a threshold 
model is at equation (33)). Tong (1983) provides one threshold modeling methodology for 
this behavior (TAR - Threshold Autoregression) that identifies piecewise linear pieces of 
nonlinear functions over disjoint subregions of the domain D of the time series {Xt}, i.e., 
identify linear models within each disjoint subregion of the domain. One application of 
Tong’s threshold modeling methodology is for nonlinear systems thought to possess peri- 
odic behavior in the form of stationary sustained oscillations (limit cycles). Tong’s threshold 
methodology has tremendous power and flexibility for modeling of many times series. How- 
ever, unless Tong’s methodology is constrained to be continuous, it creates disjoint subregion 
models that are discontinuous at subregion boundaries. 

With MARS, by letting the predictor variables be lagged values of a time series, one 
admits a more general class of continuous nonlinear threshold models than permitted by 
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Tong’s TAR approach. The methodology for developing this class of nonlinear threshold 
models is called ASTAR (Adaptive Spline Threshold Autoregression). The fact that one 
obtains a more general class of continuous nonlinear threshold models can be shown using 
a simple example. Let Xr for r = 1, . . . , AT, be a time series we wish to model with ASTAR 
using, for example, p = 3 lagged predictor variables namely, Xt-i,Xt -2 and Each 

forward step of the ASTAR algorithm selects one and only one set of new terms for the 
ASTAR model from the candidates specified by previously selected terms of the model. 
For our example problem the sets of candidates in the initial forward step of the ASTAR 
algorithm are 



{Xr-i - t’)+ and {t* - Xt_i)+, or 
{Xr -2 - <*)+ and (t* - Xt- 2 )+, or 

(Xr-3 - n+ and (f - Xr-3)+, (28) 

for some partition point (threshold) t* in the individual domain of the lagged predictor 
variables. For our example problem, assume that ASTAR selects the lagged predictor 
variable Xr-2 with threshold value t* = ti i.e., (Xt- 2 ~ ^i)+ and (ti -r Xt- 2 )+ are the 
initial terms (other than the constant) in the ASTAR model. The sets of candidates in the 
second forward step of the ASTAR algorithm includes all candidates in (28) and the new 
sets of candidates: 

{Xr-i - n+{Xr-2 - ti)+ and (r - X,-i)+(AV _2 - ti)+, or 
(Xt _3 — t*)^(Xr -2 — ^i)+ and (t* — Xt_3)+(Xt_ 2 — ^i)+t or 
(Xr-i - i’)+(<i - Xt_ 2 )+ and (f - XT_i)+(ti - Xr_ 2 )+, or 
{Xr-3 - n+iU - Xr-2)+ and (r - Xr_3)+(«i ~ Xr-2)+, (29) 

due to the initial selection of {Xr-2 — ^i)+ and (ti — Xr- 2 )+, and where t* is to be de- 
termined. Thus one could have multiple thresholds on one variable, say Xr- 2 , by again 
selecting as the next set of model terms {Xr -2 — ^*)+ and (t* — Xr-2)+ from (28) for some 
partition point t* ti. The sets of candidates for each subsequent forward step of the 
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ASTAR algorithm is nondecreasing in size and is based on previously selected terms of the 
model. As discussed in Section D, the forward-step algorithm is followed by a backward- 
step algorithm that trims the excess {M — S) terms from the model, where S is the final 
number of terms in the model, with 1 < 5 < M. 

Let the predictor variables in MARS for the rth value in a time series {At} be Xr-i, 
Xr- 2 , •• •> Xt-p, which we represent as Following (22), the functional form of the 

ASTAR model that estimates Xt is 

Xr = (30) 

1=1 

where Xr is an additive function of the product spline bzisis functions 
associated with the subregions The functional form of the ASTAR model (30) 

may be expanded using the ordered sequences of truncated spline functions (20 and 21) 
that define each product spline basis function. Let a and 6 be dummy variables that index 
the ordered sequence of truncated spline functions Ta,ri,{X^_i) such that 0 < a < 6 < j. 
The functional form of the ASTAR model (30) for the rth value in a time series (At) using 
this expansion is 

5 

^r = ^Cj n NM^r-v-01+ (31) 

1=1 

where the argument, of TaXti^r-i)^ ^i(-^t-i) suppressed for simplicity. 

Also Tb = (±v, 0 from (20), and sguy is the sign of v that determines a left (— v) or right 
(-l-u) truncated spline function. 

By modeling univariate time series using ASTAR we overcome some of the limitations 
of Tong’s approach. The ASTAR methodology creates threshold time series models that 
are naturally continuous in the domain of the predictor variables, and it allows interactions 
among lagged predictor variables. Also, the ASTAR time series model can have multiple 
lagged predictor variable thresholds, e.g., the model (29) if the new partition point t* 

In contrast, Tong’s methodology creates threshold models from piecewise linear models 
whose terms are restricted to the initial sets of candidates of the ASTAR algorithm (equation 
(28) for our example). Tong’s threshold models do not allow interactions among lagged 
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predictor variables and are usually limited to a single threshold value over all the lagged 
predictor variables because of the difficulties associated with the threshold selection process. 

An initial question that exists is whether MARS is able to identify and model simple 
linear and nonlinear times series models? If not it would be of little value to use MARS 
with real data with unknown structure. In the next two sections, simulation experiments 
are used to determine the ability of MARS to detect and model simple linear and nonlinear 
time series models. The simulation of an AR(1) model with known coefficients examines 
the ability of ASTAR to detect and model a simple linear time series. The simulation of a 
threshold model with ‘AR(l)-like’ models in each disjoint subregion examines the ability of 
ASTAR to detect and model simple nonlinear threshold time series. The interest in these 
simulations is two-fold: how often was the true model identified, and if so, how well were 
the parameters K and p estimated. Finally, as a demonstration of the ability of ASTAR 
to model a real univariate time series system, the last section of this chapter considers the 
widely studied yearly Wolf sunspot numbers, a nonlinear time series with periodic behavior. 

1. AR(1) Time Series Model Simulations 

The initial simulation experiments are of the first order autoregressive (AR(1)) 
time series model. 



Xr = pXr-i + K + er (32) 

where r = 1,2,...,A^ indexes the time series, p is a constant coefficient varied within 
experiments, K is the model constant, taken to be zero, and Cr is The model is 

usually considered under the stationarity conditions (| p |< 1), but non-stationary processes 
such as random walks (| p | = 1) and explosive processes (| p | > 1), are also of interest. 

Two categories of experiments were conducted using the AR(1) time series model. 

The first experiment required ASTAR to identify and estimate parameters of a 
time series model from the simulated data of the AR(1) time series model using one lag 
predictor variable Xr-i, and using AT = 3, the maximum number of subregions in the 
forward-step ASTAR procedure. The first experiment’s alternative models (to the AR(1) 
time series model) either have no Xr-i term (a constant model) or have a Xr-i term with 
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a threshold value t greater than In this case we call the threshold value t 

an internal threshold. 

The second experiment required ASTAR to identify and estimate parameters of a 
time series model from the simulated data of the AR(1) time series model when up to four 
lag predictor variables, are allowed, and using M = 8, the maximum number 

of subregions allowed in the forward-step ASTAR procedure. The second experiment’s 
alternative models include constant models, time series models with an internal threshold 
value, or any time series model that includes a term other than . 

Several simulation results are shown in Figures 2 - 7 for p = .5, .7 and .9, K = Q, 
and Ct = N(0,1). Each figure is a series of box plots for the estimated coefficients of the 
100 simulated models correctly identified as AR(1) time series models by ASTAR for in- 
creasing values of iV, the statistics for p being given in the top set of boxplots, and the 
statistics for K in the bottom set. The true value of each model coefficient is identi- 
fied by the dashed line across the box plots. At the top of each figure we see the length 
N of each simulated time series, the number C of the 100 simulated models correctly 
identified by the ASTAR procedure, and the equivalent sample size for independent data, 
Eq S SIZE = (iV/ p‘) (Priestley, 1981). Underneath each box plot is summary in- 

formation for the coefficient estimates of the correctly identified AR(1) time series models 
i.e., the sample mean and sample standard deviation of the estimated values in the box 
plots. By comparing the true and the estimated values of the model coefficients across 
increasing values of N it is observed that the estimated values of the coefficients tend to the 
true value as N increases. Also, in all but one simulation the number of correctly identified 
models C rises to 100 for increasing values of N. Note that the ASTAR estimates for p have 
negative bias for small values of N that generaUy decreases as N increases. The downward 
bias of p is similar to that identified by Kendall et al. (1983) and others when using data 
for estimating autocorrelations. 

2. Nonlinear Threshold Time Series Model Simulations 

To observe the ability of ASTAR to capture nonlinear threshold time series model 
characteristics, simulation experiments of the 2-subregion threshold time series model (Tong, 
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Figure 2* AR(1) MODEL SIMULATION: ASTAR estimates for p = .5, A" = 0 using = 

N(0f 1) from C simulations of an AR(1) model for increasing values of N, with P = 1 
lag predictor variables, and M = 3, the number of forward-step subregions permitted 
in the ASTAR algorithm. Each simulation consists of 100 replications. The boxplots 
are for the estimates p and K of the model parameters when ASTAR correctly iden- 
tified the AR(1) model. For N = 100, 2 simulations were incorrectly identified as 
constant models. 
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Figure 3. AR(1) MODEL SIMULATION: ASTAR estimates for p = .5,A: = 0 using = 

N(0, 1) from C simulations of an AR(1) model for increasing values of N with P = 4 
lag predictor variables, and M — S, the number of forward-step subregions permitted 
in the ASTAR algorithm. Each simulation consists of 100 replications. The boxplots 
are for the estimates p and K of the model parameters when ASTAR correctly iden- 
tified the AR(1) model. For N = 100, 5 simulations were incorrectly identified as; 2 
constant models, 1 AR(2) model and 2 AR(3) models. For N = 500, 2 simulations 
were incorrectly identified as constant models. 



33 



^ K for T = 1,2. ... J4 





K 




100 




250 


500 


750 




C 




100 




100 


100 


100 


Eq S SIZE 




17 




44 


68 


132 


i 


0 
















O.B 






1 


I 




T 






0.7 






1 


1 










T 










'-H 




‘ i ' 


O.B 












1 






o ^ 












• 






Mean 


0.67304 


0.60267 


0.68788 


0.69701 


Sid Dev 


0.07068 


0.04065 


0.03157 


0.02654 



0.04 

0.02 


rv 




i , 


, J , 












1 


1 




o 

-0.02 






1 


I 






Mean 


0.00086 


0.00047 


0.00029 


0.00012 


Std Dev 


0.01310 


0.00537 


0.00262 


0.00160 



Figure 4. AR(1) MODEL SIMULATION: ASTAR estimates for p = .7,/f = 0 using = 

AT(0, 1) from C simulations of an AR(1) model for increasing values of with P = 1 
lag predictor variables, and M = 3, the number of forward-step subregions permitted 
in the ASTAR algorithm. Each simulation consists of 100 replications. The box- 
plots are for the estimates p and K of the model parameters when ASTAR correctly 
identified the AR(1) model. Here all cases were correctly identified. 
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Figure 5. AR(1) MODEL SIMULATION: ASTAR estimates for p = = 0 using = 

iV(0, 1) from C simulations of an AR(1) model for increasing values of N with P = 4 
lag predictor variables, and M = 8, the number of forward-step subregions permitted 
in the ASTAR algorithm. Each simulation consists of 100 replications. The boxplots 
are for the estimates p and K of the model parameters when ASTAR correctly iden- 
tified the AR(1) model. For N = 100, 6 simulations were incorrectly identified as; 2 
AR(2) models, 2 AR(3) model and 2 AR(4) models. 
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Figure 6, AR(1) MODEL SIMULATION: ASTAR estimates for p = .9, AT = 0 using = 

A^(0, 1) from C simulations of an AR(1) model for increasing values of TV, with P = 1 
lag predictor variables, and M = 3, the number of forward-step subregions permitted 
in the ASTAR algorithm. Each simulation consists of 100 replications. The box- 
plots are for the estimates p and K of the model parameters when ASTAR correctly 
identified the AR(1) model. Here all cases were correctly identified. 
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Figure 7. AR(1) MODEL SIMULATION: ASTAR estimates for p - ,9, K 0 using (t? = 

N{0y 1) from C simulations of an AR(1) model for increasing values of N with P = 4 
lag predictor variables, and Af = 8, the number of forward-step subregions permitted 
in the ASTAR algorithm. Each simulation consists of 100 replications. The boxplots 
are for the estimates p and K of the model parameters when ASTAR correctly iden- 
tified the AR(1) model. For N = 100, S simulations were incorrectly identified as; 2 
AR(2) models and 1 AR(S) model 
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1983), 



AV = 



PiAV-i + if AV-i < 0 

p2Xr~\ + if A"r_i > 0 



(33) 



were considered, where r = l,2,...,iV indexes the time series, p\ and p 2 are constant co- 
efficients varied for different experiments and £ is iV(0,<J^). This is the simplest threshold 
model which has been proposed and provides a convenient starting point for initial evalua- 
tion and validation of the ASTAR procedure. Note that the nonlinear threshold time series 
model (33) has an ‘AR(l)-like’ model in each subregion, which implies that Xt can have 
different variance in each of the two subregions since the variance of c is assumed constant 
in each region. Also for a threshold at ATr-i = 0, the expected number of sample values in 
each subregion will be the same only if p\ = — p 2 - 

Two categories of experiments were conducted using the nonlinear threshold time 
series model. 

The first experiment required ASTAR to identify and estimate parameters of a 
time series model from the simulated data of the nonlinear threshold time series model using 
one lag predictor variable AV-i, and using Jlf = 3, the maximum number of subregions in 
the forward-step ASTAR procedure. The first experiment’s alternative models include the 
constant model, linear AR(1) time series models, or nonlinear time series models that have 
more than one internal threshold. 

The second experiment required ASTAR to identify and estimate parameters of 
a time series model from the simulated data of the nonlinear threshold time series model 
when up to four lag predictor variables, {AT^-i}f=i, are allowed, and using M = 10, the 
maximum number of subregions allowed in the forward-step ASTAR procedure. The second 
experiment’s alternative models include the constant model, linear and nonlinear time series 
models with terms other than A^-i, or nonlinear time series models with more than one 
internal threshold value on AV-i- 

Several simulation results are shown in Figures 8 ~ 11 for pi,p 2 = -7, .3 and —.6, .6, 
and €r = N(0,.25). As with the previous AR(1) time series model simulation experiments, 
each figure is a series of box plots for the estimated coefficients of the 100 simulated models 
correctly identified as nonlinear threshold time series models by ASTAR for increasing values 
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of N. The true value of each model coefficient is identified by the dashed line across the box 
plots. At the top of each figure is the length N of each simulated time series, and the number 
C of the 100 simulated models correctly identified by the ASTAR procedure. Underneath 
each box plot is summary information for the coefficient estimates of the correctly identified 
nonlinear threshold time series models i.e., the sample mean and sample standard deviation 
of the estimated values in the boxplots. Note that the number of correctly identified models 
rises for increasing values of N, However, a consistent improvement in the mean and 
standard deviation for the estimated values of the model coefficients is not always observed 
for increasing values of N, For the most part this is attributed to the increasing number of 
correctly identified models for increasing values of N, 
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Figure 8. THRESHOLD MODEL SIMULATION: ASTAR model estimates for pi,p2 = *7, .3 
using <T^ = A(0, .25) from C simulations of a threshold model for increasing values 
of Ny with P = 1 lag predictor variables, and M = 3, the number of forward-step 
subregions permitted in the ASTAR algorithm. Each simulation consists of 100 repli- 
cations. The boxplots are for the estimates pi and p 2 of the model parameters when 
ASTAR correctly identified the threshold model. The models of the simulaiions that 
ASTAR did not correctly identify as the threshold model contained an incorrect number 
of subregions or lacked an AR(1) term in one of the two subregions. 
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Figure 9. THRESHOLD MODEL SIMULATION: ASTAR estimates for p\yP 2 = .7, .3 using 
< 7 ^ = N{0, .25) from C simulations of a threshold model for increasing values of Ny with 
P = 4 lag predictor variables, and M = 10, the number of forward-step subregions 
permitted in the ASTAR algorithm. Each simulation consists of 100 replications. 
The boxplots are for the estimates p\ and p 2 of the model parameters when ASTAR 
correctly identified the threshold model. The models of the simulations that ASTAR 
did not correctly identify as the threshold model contained an incomecf number of 
subregions, lacked an AR(1) term in one of the two subregions or contained terms 
with OrXj^A- 
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THRESHOLD MODEL SIMULATION: ASTAR estimates for pi,p 2 = —*6, .6 using 
(T^ = AT(0, .25) from C simulations of a threshold model for increasing values of 
N, with P = 1 lag predictor variables, and M = 3, the number of forward-step 
subregions permitted in the ASTAR algorithm. Each simulation consists of 100 
replications. The boxplots are for the estimates p\ and p 2 of the model parameters 
when ASTAR correctly identified the threshold model The models of the simulations 
that ASTAR did not correctly identify as the threshold model contained an incorrect 
number of subregions or lacked an AR(1) term in one of the two subregions. 
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Figure 11. THRESHOLD MODEL SIMULATION: ASTAR estimates for pi ,/>2 = —-6, .6 using 
= AT(0, .25) from C simulations of a threshold model for increasing values of 
AT, with P = 4 lag predictor variables, and M = 10, the number of forward-step 
subregions permitted in the ASTAR algorithm. Each simulation consists of 100 
replications. The boxplots are for the estimates p\ and p 2 of the model parameters 
when ASTAR correctly identified the threshold models The models of the simulaiions 
that ASTAR did not correctly identify as the threshold model contained an incorrect 
number of subregions, lacked an AR(1) term in one of the two subregions or contained 
terms with A’T- 2 ,A'r_ 3 , or Xj-a- 
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3. Threshold Modeling of the Yearly Wolf Sunspot Numbers 

As an illustration of ASTAR’s ability to model an actual time series we examined 
221 (1700-1920) of the yearly Wolf sunspot numbers. These yearly Wolf sunspot numbers 
are relative measures of the average monthly sunspot activity on the surface of the sun (see, 
e.g., Scientific American, February 1990). The analysis was performed on the yearly sunspot 
numbers to facilitate comparison of the MARS methodology with other nonlinear time series 
modeling efforts (analysis of monthly sunspot numbers would also be of interest). Some 
of the early analysis and modeling of the yearly sunspot numbers was performed by Yule 
(1927) as an example for introducing autoregressive models. Recently suggested nonlinear 
models of the yearly sunspot numbers include threshold models (Tong, 1983, 1985) and 
bilinear models (Rao and Gabr, 1984). A detailed review of the history of the sunspot 
numbers is provided by Izenman (1983). 

The data (Figure 12) is quite ‘periodic’ but has nonsymmetric cycles with ex- 
tremely sharp peaks and troughs. The cycles (Table 1) generally vary between 10 and 
12 years with the greater number of sunspots concentrated in each descent period versus 
the accompanying ascent period. The average ascent period is 4.60 years and the average 
descent period is 6.58 years. Attempts to model the data with a fixed cycle period signal 
plus (possibly correlated) noise have failed because the cyclical component in the spectrum 
(Figure 14, top) is quite spread out and diffuse. 

TABLE 1. ASCENT AND DESCENT PERIODS OF THE YEARLY WOLF SUNSPOT 
NUMBERS (1700-1920). 



Ascent period 55456633366 
Descent period 76665566 11 67 



Ascent (cont) 745435444 
Descent (cont) 36878688 
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Figure 12. The yearly Wolf sunspot numbers (1700-1955). The data is quite ‘periodic’ but has 
nonsymmetric cycles with extremely sharp peaks and troughs. The cycles generally 
vary between 10 and 12 years with the greater number of sunspots concentrated in 
each descent period versus the accompanying ascent period. 
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a. Stable Periodic Limit Points in Threshold Models 

One of the interesting characteristics of Tong’s analysis of the yearly sunspot 
numbers included the development of threshold models with stationary harmonic behavior 
or limit cycles, i.e., models in which f(X) is a deterministic threshold function with a limit 
cycle, perturbed by Gaussian white noise. Using Tong (1983), let r = 1,2, .. . index a times 
series and let = {Xr^Xr-i , . . . p+i} denote a p-dimensional vector in D E 3?^ that 
satisfies the equation. 



X? = /(X?_i), 



(34) 



where / is a vector-valued function. Let P(X) denote the jth iterate of /, i.e., 



f{X) = /(/(/(... (/ (ar)) . . .))). (35) 

j of them 

We say that a p-dimensional vector X*^ is a stable limit point of the function / with respect 
to the domain D if 



P{Xo) X'P as i 00 V Xo e D. (36) 

Also, we say that a p-dimensional vector Cj is a stable periodic limit point with period 
T > 1 of the function / with respect to the domain D if 

f'^(Xo) C; as i 00 V Xo € K, (37) 

and the convergence does not hold for any divisor of T. It follows that Cj, /^(Cj), 
/^(Cj),..., are simultaneously distinct stable periodic points of the function 

/ with respect to D. If we let /‘(C^) be denoted by i = 0, 1 , ... ,T — 1 , then the set 
{Cj, C 2 , C 3 , . . . , is called a stable limit cycle of the function / with respect to D. 

b. ASTAR Models for the Yearly Wolf Sunspot Numbers 

The primary interest in limit cycles is for investigating the underlying char- 
acteristics of the true time series function /(X) given at ( 1 ). If the cyclical behavior of 
/(X) for the yearly sunspot numbers can be modeled as a limit cycle perturbed by Gaus- 
sian white noise, then when applying ASTAR to the yearly sunspot numbers it would be 



45 



satisfying to identify an underlying limit cycle in the estimate of f{X). With this objective 
in mind 20 ASTAR models of the yearly sunspot numbers were investigated. The period 
of the modeling effort (1700-1920) corresponds to similar modeling efforts by Tong (1983, 
1985) and Rao and Gabr (1984). The maximum order of each model (number of lagged 
predictor variables) was restricted to 20 and the first 20 sunspot numbers (1700-1719) were 
used for model initialization. It might be noted that the ASTAR models were identified 
with MARS 2.0 installed on an IBM3033 Computer using VS Fortran. Each model required 
from 15 to 30 seconds of CPU time. 

Table 2 provides a summary of the 20 ASTAR models for the yearly sunspot 
numbers (1720-1920), ordered by the mecin sum of squares (MSS) of the fitted residuals 
for each ASTAR model. The first three columns identify, respectively, the model number, 
MSS and the modified generalized cross validation criterion GCV* given in (23). The 
fourth through sixth columns identify the number of estimated parameters, the number 
of partition points and the maximum level of interaction in each model. Columns seven 
and eight identify the length (in years) of each model’s limit cycle (if one exists) and the 
number and lengths (in years) of the one or more type ‘subcycles’ (tiscent and descent 
periods) within the limit cycle. We use MSS instead of MSS^/^ to facilitate comparison of 
the ASTAR models with other modeling efforts of the yearly sunspot numbers. 

The different models in Table 2 occurred because the user parameters of the 
ASTAR algorithm were varied. These parameters include: MI = 2, 3, or 4, the maximum 
level of lagged predictor variable interaction; MS = 10 and 18, the minimum separation of a 
lagged predictor variable’s partition points; M = 15, 20, and 25, the number of steps during 
the forward-step algorithm; and p = 12 or 18, the number of lagged predictor variables (12 
lagged predictor variables correspond to the maximum order of the model used by Tong 
(1985) for prediction of the sunspot numbers). The separation of a predictor variable’s 
partition points may be thought of as a smoothing parameter similar to the bandwidth 
in kernel smoothing. Some of the resulting models were identical. For example, identical 
3-way interaction models could result from using the different model parameters MI = 3 
and MI = 4 if the MARS algorithm rejects all 4- way interactions. Also, most of the models 
formed using MI = 2 were not of interest. Note that Chapter IV provides a discussion of 
the user parameters within the MARS algorithm. 
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TABLE 2. ASTAR MODELS FOR THE YEARLY WOLF SUNSPOT NUMBERS (1720-1920). 





MSS 


GCV* 


Number of 
Model 
Parameters 


Number of 
Interior 
Thresholds 


Level of 
Model 
Interaction 


Length of 
Limit Cycle 
(in years) 


Number 
(Lengths) of 
Sub cycles 


1 


91.4 


151.7 


25 


9 


3 


— 






2 


91.6 


136.4 


16 


4 


4 


225 


27 


(8,9) 


3 


95.3 


157.9 


18 


4 


5 


— 






4 


101.0 


130.4 


15 


6 


4 


43 


4 


(10,11) 


5 


103.6 


183.9 


18 


3 


4 


— 






6 


110.5 


187.6 


17 


3 


3 


167 


15 


(11,12) 


7 


111.7 


153.9 


14 


4 


3 


9 


1 


( 9 ) 


8 


113.0 


162.8 


19 


7 


2 


9 


1 


( 9 ) 


9 


114.1 


141.0 


14 


6 


3 


137 


13 


(10,11) 


10 


114.2 


160.8 


14 


3 


4 


— 






11 


114.2 


194.7 


17 


3 


3 


120 


11 


(10,11) 


12 


115.9 


162.9 


13 


3 


3 


— 






13 


115.9 


163.6 


13 


3 


4 


120 


11 


(10,11) 


14 


116.0 


174.3 


13 


2 


4 


94 


10 


(9,10) 


15 


117.6 


190.9 


15 


2 


4 


— 






16 


119.5 


171.2 


14 


3 


3 


— 






17 


119.6 


206.6 


18 


3 


3 


23 


4 


(5,6) 


18 


119.8 


164.4 


11 


2 


3 


133 


12 


(11,12) 


19 


125.6 


172.7 


11 


2 


2 


78 


7 


(11,12) 


20 


126.2 


192.7 


13 


1 


3 


— 
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Some form of a limit cycle exists in 12 of the 20 ASTAR models. Also, 7 of 
the 12 models, namely 4,6,9,11,13,18 and 19, provide limit cycles with lengths 43, 167, 137, 
120, 120, 133 and 78 respectively, and ‘subcycles’ with lengths and range similar enough to 
the behavior of the yearly sunspot data (Table 1) to warrant further analysis. Of these 7 
models, 2 (Models 4 and 9) have both low GCV* values and provide fitted residuals that 
appear, using test statistics and graphical analysis, e.g., Q-Q plots, to be independent and 
Gaussian. Some of the test statistics for the fitted residuals of these two models are provided 
in Table 3. 

TABLE 3. STATISTICS FOR THE FITTED RESIDUALS OF ASTAR MODELS 4 
AND 9 OF THE YEARLY WOLF SUNSPOT NUMBERS (1720-1920). 





Model 4 


Model 9 




Mean 


0.000 


0.000 




GCV 


130.4 


141.0 




Skewness 


.346 


0.0813 


0 for normal distribution 


Kurtosis 


0.153 


0.673 


0 for normal distribution 


K-S 


.349 


.275 


level of significance 


C-M 


> .15 


> .15 


level of significance 


A-D 


> .15 


> .15 


level of significance 


L-M 


.0466 


.6892 


level of significance 



The Skewness and Kurtosis statistics serve as a general indicator of the sym- 
metry and heaviness of the tails for the sample distribution function of the fitted residuals 
Fc(x). The Kolmogorov-Smirnov (K-S) test statistic measures the maximum absolute dis- 
tance between Fc(x) and the hypothesized true normal N(0,1) distribution function Fx(x) 
while the Cramer-von Mises (C-M) statistic measures the integral of the squared distance 
between the two functions. A drawback to the K-S and C-M tests are that they lack sensi- 
tivity to departures from the null hypothesis that occur in the tails of a distribution. As an 
approach to overcome the lack of sensitivity of the K-S and C-M tests, the Anderson-Darling 
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(A-D) test statistic weights the distances between the two functions. A final test for inde- 
pendent and Gaussian error structure is provided by the Lin-Mudhoekar (L-M) (1980) test 
statistic which tests for asymmetry. Even though the GCV* for Model 4 is lower than that 
for Model 9, we rejected Model 4 due to the low level of significance of the L-M test statistic 
and identified Model 9 as the best model (with limit cycle) of the 20 models considered in 
the initial analysis. 

Note that the MARS algorithm generated ASTAR Model 9 using 20 lagged 
predictor variables that were permitted to form 1, 2, and 3-way interactions during a max- 
imum of M = 15 forward steps of the forward-step algorithm. The minimum span between 
threshold values for a single predictor variable was 18 sunspots. This span was chosen 
because there were 18 sunspot cycles between 1720 and 1920. Model 9 is 

ASTAR Model 9 

' 2.711 -I- .960Xt_i -I- .332(47.0 - Xr-s)+ - .257(59.1 - Xr-9)-H 



Xr 






- m3Xr-l{Xr-2 - 26.0)+ + .017Xr_l(44.0 - Xr-3)-H 



(38) 



- m2Xr-l{U.l-Xr-4)+ + .004Xr-l(26.0-XT_2)+(Xr-5 -41.0)+ 



where (i)+ is a plus function with value x if i > 0 and 0 otherwise. Model 9 has 14 
parameters with 8 terms (a constant term and 3 one-way, 3 two-way and 1 three-way 
interactions) and 6 threshold values (1 each on Xr-2, Xr-s, Xr- 4 , and Xr-g and 2 on 
X,_5). 

Figures 13-19 are various plots of the fitted values and residuals of ASTAR 
Model 9. Figure 13 shows the fitted values of the model versus the yearly Wolf sunspot 
numbers (1720-1920). The model fit is further examined using the estimated normalized 
periodogram (Figure 14) of the sunspot number data [top] and model fit [bottom], empirical 
quantile- quantile plot (Figure 15) and autocorrelation function plots (Figure 16) of the fitted 
values of the model versus the yearly Wolf sunspot numbers (1720-1920). The model appears 
to equally overfit and underfit the peaks and troughs as it captures the general structure 
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of the yearly sunspot numbers. Again, note the spread of the cyclical component in the 
spectrum (Figure 14) that has complicated efforts to model the sunspot numbers with a fixed 
period signal plus (possibly correlated) noise. The fitted residuals of the model are examined 
using residual versus time and fit plots (Figure 17) and the residual autocorrelation function 
plot (Figure 18). In Figure 17 the slight lack of negative residuals for small fitted values 
of the model is attributed to the yearly sunspot numbers being positive random variables. 
In Figure 18 no pattern of dependence appears in the autocorrelation function of the fitted 
residuals. Figure 19 shows the 137 year limit cycle of Model 9 with its ascent and descent 
periods. The limit cycle is asymmetric with a range in amplitude of 17.7 to 94.5 and an 
average accent /descent period of 4.3/6.23 years versus 4.6/6.58 years for the actual yearly 
sunspot numbers from 1700 to 1920 (Table 1). In comparing Model 9’s limit cycle (Figure 
19) with the real yearly sunspot data (Figure 13) note that the standard deviation of the 
fitted residual’s error variance is estimated as (MSS)^/^ = 10.69 sunspots. 

Wolf Sunspot Numbers 




Veer 

Figure 13. The yearly Wolf sunspot numbers (1700-1955) versus the fit of ASTAR Model 9 
(1720-1920). The yearly sunspot numbers (1700-1719) were used for initialization. 
The yearly sunspot numbers (1921-1955) were used to examine the prediction per- 
formance of ASTAR Model 9 and other models of the yearly sunspot numbers. 
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EsUmaled NorroaHzed Feriodofrain of the Sunvpol Data 1720-1920 




Eatimaied Normalized Periodogram of ASTAR Model 9 1720-1920 




Figure 14. The estimated normalized periodogram of the yearly Wolf sunspot numbers (1720- 
1920) [top] versus the estimated normalized periodogram of ASTAR Model 9 (1720- 
1920) [bottom]. The broad conclusion from the top periodogram is that there is a 
rather diffuse cycle in the data with a period of about 11 years, and a longer period 
of about 67 years. 



Figure 20 is a graphical representation of ASTAR Model 9. Each column in 
the plot represents an individual term in equation (38) that is identified along the plot’s 
horizontal axis, e.g., (1) represents the term (second term of line one in equation (38)) 
while (1),(2) represents the 2-way interaction term between Xr-i and Xr -2 (first term of 
line two in equation (38)). The vertical axis defines the range of values of the yearly sunspot 
numbers during the modeling period from 1720-1920. The plot lines (1-way interaction), 
symbols (2-way interaction) and the combination of lines and symbols (3-way interaction) 
define the range of yearly sunspot number values that permit a nonzero contribution to the 
value of Xr by a term of the model. Located underneath the plot is summary information 
of the contributions by each model term during the modeling period to include the number 
(Num) of times each model term made a nonzero contribution to the value of Xr and 
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Sunipot Numben 

Figure 15. The empirical quantile-quantile plot for the fitted values of ASTAR Model 9 versus 
the yearly Wolf sunspot numbers for the period 1720-1920. No obvious pattern 
exists. 
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AUTOCORREUTION 




UGS 



Figure 16. The autocorrelation functions of the yearly Wolf sunspot numbers and ASTAR Model 

9 for the period 1720-1920. The dominant cycle of period approximately 11 years is 
clearly evident. 



53 



Sunspoi Data; (1720-1920) Residuals vs *nine using IMRS lilode) 0 




I I i i I — i .1 ■ i 

17&0 1800 I 860 1900 

Y«ar 

Sunspot Data; (1720-1020) Residuals vs Fitted Values using MARS Mode) O 




L A aA 1 i I I 

0 40 00 120 

Fitted Velues 



ure 17. Fitted residuals from ASTAR Model 9 of the yearly Wolf sunspot numbers (1720- 
1920) versus year [top]. Fitted residuals versus the fitted yearly sunspot numbers 
from ASTAR Model 9 of the yearly Wolf sunspot numbers (1720-1920) [bottom]. 
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Figure 18. The autocorrelation function (first 40 lags) of the fitted residuals for ASTAR Model 9 
of the yearly Wolf sunspot numbers (1720-1920). There is no pattern of dependence 
in the residuals. The confidence bounds are approximate, individual confidence 
bands. 
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Ascent — 4, 5, 4, 5. 4, 5, 4, 4, 4, 4, 4, 4, 5 
Descent — 6, 6 , 6 , 6 . 6 , 6 , 7, 6 , 7,6,7, 6, 6 




Figure 19. The limit cycle for ASTAR Model 9 of the yearly Wolf sunspot numbers (1720-1920). 

The limit cycle is 137 years long with the indicated ascent and descent periods. The 
limit cycle is generated using ASTAR Model 9 initialized with the yearly sunspot 
numbers (1700-1719). The ‘subcycles’ have lengths of 10 or 11 years with 4 or 5 
years per ascent period and 6 or 7 years per descent period. 
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the minimum (Min), mean (Mean) and maximum (Max) values of each term’s nonzero 
contributions. 

The key point for discussing a graphical representation for a model such 
as that given in Figure 20 is that it can be used to analyze the use for and contribution 
of each of the terms in an ASTAR model. For example, using the ASTAR model of the 
yearly sunspot numbers given in equation (38) and graphically displayed in Figure 20, the 
2-way interaction term (1),(4), which represents the term — .032 Xt_i(17.1 — Xt- 4 )+, has a 
nonzero contribution to the value of Xr if and only if Xr-i > 0 and Xt _4 < 17.1. Using 
Figures 20 and 13 it can be seen that this term’s maximum contribution arises when Xt-i 
is at a yearly sunspot cycle peak and Xr -4 is in a yearly sunspot cycle trough. Thus the 
purpose of this term is to initiate the downward turn of Xt to the next yearly sunspot cycle 
trough. Another example is the 3- way interaction term (1),(2),(5), which represents the 
term .004 Xt_i(26.0 — Xt_ 2 )+(A't _5 — 41.0)+. This term has a nonnegative contribution to 
the value of Xt if and only if Xt-i > 0» Xt -2 < 26 and Xt -5 > 41. Again using Figures 20 
and 13 it can be seen that this term’s maximum contribution arises when Xt-i and Xt -2 
are in a yearly sunspot cycle trough and Xt-s is at a yearly sunspot cycle peak. Thus the 
purpose of this term is to initiate the upward turn of Xt to the next yearly sunspot cycle 
peak. Likewise, the (1),(2) term has a large contribution when both Xt-i and Xt -2 are 
high. Like the (1),(4) term, the (1),(2) term is used to initiate the downward turn of Xt 
to the next yearly sunspot cycle trough. However, unlike the (1),(4) term whose number 
of nonzero contributions to the value of Xt is severely restricted due to the threshold at 
17.1, the (1),(2) term continues to drive Xt into the next trough until Xt-2 < 26. Similar 
analysis can be performed on other model terms or combinations of model terms. 

Other useful graphical displays for the ASTAR models of the yearly sunspot 
numbers are the individual plots of each term’s contribution to the value of Xt versus 
sunspot number year. These plots complement Figure 20 and permit the comparison of the 
magnitude and location of each term’s contribution. In summary, the graphical displays 
mentioned above provide a valuable analytical tool for studying nonlinear time series models 
such as those developed with the ASTAR methodology. 
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ASTAR Model 9 — Sunspot Numbers (1720-1920) 
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(1).(2) 


(1).(3) 


iW) 


(1).(2).(5) 


Num 


200 


117 


135 


127 


109 


55 


35 


Min 


1.34 


.033 


-15.2 


-43.8 


.01 


-49.2 


.028 


Avg 


43.8 


6.58 


-8.85 


-8.28 


13.36 


-13.3 


9.71 


Max 


143.2 


15.6 


-.566 


-.014 


62.4 


-.08 


42.98 



Figure 20, Graphical representation of ASTAR Model 9 given in Equation (38) of the yearly 
Wolf sunspot numbers (1720-1920). Each column in the plot represents a term of 
the model whose contributions to the value of Xr is summarized underneath the 
plot. Lines in the first three colunms, labeled (1) and (5) and (9) define the range of 
values for nonzero contributions to the value of Xj by the linear terms Xr-i, 
or Xt -9 respectively; symbols in the next three columns, labeled (1),(2) and (1),(3) 
and (1),(4), define the range of values for nonzero contributions to the value of Xr 
by the 2~way inieraciion terms JVr-iXr- 2 , Xt-iXj-s and Xj-iXt^a respectively; 
and in the last colunm the combination of lines and symbols define the range of 
values for nonzero contributions to the value of Xr by the S-way inieraciion term 



58 



c. Sunspot Number Prediction using ASTAR Models 

The predictive performance of ASTAR Model 9 (38) was investigated by 
comparing its forward-step predictions with the forward-step predictions of other models 
that were developed using the 221 yearly sunspot numbers from 1700-1920. These include 
forward-step predictions (Moeanaddin, 1989) for the 35 yearly sunspot numbers from 1921- 
1955 using the Full Linear Autoregressive, Bilinear Subset (Rao, 1984) and Self Exciting 
Threshold, SETAR, (Tong, 1983) models. The Full Linear Autoregressive (39) and Bilinear 
Subset (40) models used the 10 sunspots from 1700-1709 for initialization while the SETAR 
(41) and ASTAR (38) models used the 20 sunspots from 1700-1719 for initialization. 

Full Linear Autoregressive 
+ l.216Xr-i - 0.467A't- 2 - 0.142A’r_3 

+ 0.169Xr_4 - 0.147A'r-5 + O.ObdXr-e (39) 

-0.053X,._7 + 0.067X,._8 -f- 0.113A'r-9 




Bilinear Subset 



Xr 



6.886 

i 



+ 1.501X,._i - 0.767 Xt- 2 + O.llbXT-g - 0.014A',._2€r-i 
-f- 0.006 A^t— 8^t— 1 ~ 0.007A^x_i€t_ 3 -f- 0.006AV— 4 Ct— 3 

-b 0.004 Xt— 1^t- 6 "b 0.004Xr~2^T—A "b 0.002 Xt— 3Ct— 2 



(40) 



Self Exciting Threshold (SETAR) 



f 

10.544 -b 1.692X,._i 



1.159A'r-2 + 0.236 Xt-3 + 0.150X^_4 



Xr = 



7.804 



if Xr-3 < 36.6 

-b 0.743X.T-1 - 0.041X,._2 - 0.202A",._3 -b 0.173Xr_4 
- 0.227A'r-s + 0.019X,._6 -b 0.161X^_7 - 0.256Xt_8 
-b 0.319Xr_9 - 0.389Xr_io + 0.431X^_„ - 0.397X^-12 
if Xt_3 > 36.6 



(41) 
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The forward-step predictions for each of these models are obtained by fixing both the model 
terms and coefficients during the entire prediction period. The mean sum of squares for the 
errors of the predictions (PMSE) obtained by these models and ASTAR Model 9 are given 
in Table 4. 

TABLE 4* FORWARD-STEP PREDICTIONS OF THE YEARLY WOLF SUNSPOT NUM- 
BERS: The mean sum of squares error number of model parameters and the 
predictive mean sum of squares error d'g(i) for the ith forward-step prediction for the 
period (1921-1955) of the Pull Linear Autoregressive (AR), Bilinear Subset, SETAR 
and ASTAR models of the yearly Wolf sunspot numbers for the period (1700-1920). 



Model 


AR 

(Full) 


Bilinear 

(Subset) 

(Rao) 


SETAR 

(Tong) 


ASTAR 
Model 9 




199.3 


124.3 


153.7 


114.1 


Number of 
Parameters 


10 


11 


19 


14 


^?(1) 


190.9 


123.8 


153.9 


132.5 


^?(2) 


414.8 


337.6 


388.4 


314.8 


a^(3) 


652.1 


569.7 


672.7 


467.3 


^?( 4 ) 


797.3 


621.3 


641.2 


415.1 




770.8 


718.4 


835.3 


367.2 


^?(6) 


786.4 


732.4 


900.7 


408.0 


^?( 7 ) 


789.0 


781.7 


993.8 


441.2 




827.8 


833.2 


1083.6 


455.2 



The performance of the ASTAR model for forecasting the yearly sunspot 
numbers from 1921-1955 is a considerable improvement over the AR and Threshold models 
for every forward step, and it is an improvement over the Bilinear Subset model for every 
forward step except the first step. Also, it is interesting and surprising to note that the 
predictive mean sum of squares error for the ASTAR model decreases in the fourth and fifth 
step before increasing again. This phenomenon wais also identified in subsequent analysis 
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of other ASTAR models that have limit cycles. We attribute this interesting phenomenon 
to the underlying limit cycle of the models (Tong, 1985, and Moeanaddin, 1989). 

While the prediction of the yearly sunspot numbers for 1921-1955 is a con- 
siderable improvement over the previous threshold and bilinear modeling efforts, it may be 
difficult to justify using the fcth forward-step prediction as a conditional expectation when 
making the tth forward-step prediction of an ASTAR model with a threshold on Xr-j and 
i > k > j. Tong (1983, 1985) suggests, as one approach to this problem, ‘moving the 
trigger’, i.e., prohibit a threshold from forming on a lagged variable with lag less than the 
desired maximum forward-step prediction. Tong (1983) reported several TAR models that 
‘moved the trigger’ and were used for prediction of the sunspot numbers. TAR Model 
AS7133 (42) was developed with a threshold value on Xr -7 using the sunspot numbers 
from 1700-1890 and used to obtain the forward-step predictions of the sunspot numbers 
from 1921-1955 (Moeanaddin, 1989). 



TAR Model AS7133 



9.267 -I- 0.987Xr_i - 0.307Xr-2 - O.lOSXr-s + 0.166Xt_4 
- 0.297Xt-5 + 0.285A't-6 - O.lbbXr-? - O.lTlXrs 
+ 0.210X.T-9 - 0.041A',._io + 0.353X,._ii - 0.196X^-12 




if X,._7< 58.55 



(42) 



26.159 -|-1.577Xr_i - 1.240X,-_2 

if Xr-7 > 58.55 



To incorporate this idea, MARS was used to formulate several models of the 
yearly sunspot numbers by ‘moving the trigger’. This is simple to do since in the input 
to MARS one can specify that the predictor variables are not permitted to have a knot 
i.e., are linear if included. For this modeling effort the interest was to permit prediction 
of approximately one sunspot number cycle. Thus the lagged variables with lag less than 
or equal to eleven were not permitted to form knots. Note that the modeling effort was 
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restricted to 1700-1890 to correspond to modeling efforts by Tong (1983) that also ‘moved 
the trigger’. 

Table 5 gives the forward-step predictions from 2 ASTAR models created 
using MARS with the restriction that thresholds were prohibited on Xr-\ thru Xt-h- 
These two new models both have a single threshold on X^—u and thus permit up to a 14 
step ahead (full yearly sunspot cycle) forecast of the yearly sunspot numbers without the 
difficulties discussed in the previous paragraph. Model GCV9-322 (43) has 8 coefficients 
and includes a 4-way interaction term while Model GCV9-1028 (44) has 11 coefficients and 
includes only linear and 2-way interaction terms. 



ASTAR Model GCV9-322 



10.760 + 1.326Xt_i - 0.714Xr_2 

- 0.003Xr-lXr-12 + 0.568Xr-2XT-ll 
-1- .0002Xr-lXr-6{Xr-U ~ 73.9)+ 

- .000003A'r-lA'r-2A'T-6(A'r-14 ~ 73.9)+ 



(43) 



ASTAR Model GCV9-1028 



' -11.256 + 1.257X.,_i + 0.576Xt_9 

- 0.008A'r_lXr_2 + 0.002A'r_2A'T_3 - 0M2Xr-2Xr-S 
-h 0.003Xr_2Xr_ii - 0.006Xr-3A'T-9 - 0.003Xr-9X^-io 
-I- 0.004A:,_i(A'^_i4 - 60.0)+ 



(44) 



As with the previous predictions of the yearly sunspot numbers from 1921- 
1955 with ASTAR Model 9 (Table 4), the Bilinear Subset model has the best MSS for the 
first forward-step prediction. However, from the second forward step for GCV9-322 (third 
forward step for GCV9-1028), the ASTAR models have the best predictive mean sum of 
squares error and are again a considerable improvement over the Full Linear Autoregressive, 
Bilinear Subset and SETAR models for the period 1921-1955. Also, again note that the 
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TABLE 5. FORWARD-STEP PREDICTIONS OF THE YEARLY WOLF SUNSPOT NUM- 
BERS: The mean sum of squares error number of model parameters and the 
predictive mean sum of squares error &^(t) for the ith forward-step prediction for 
the period (1921-1955) of the AR, Bilinear Subset, SETAR and ASTAR models of 
the yearly Wolf sunspot numbers. Eere, in contrast to the ASTAR model used for 
the results in Table 4, thresholds were not permitted for lagged predictor variables in 
MARS unless the lag was greater than eleven. The modeling period for the AR and 
Bilinear Subset models is (1700-1920) while the modeling period for the SETAR and 
ASTAR models is (1700-1890). 



Model 


AR 


Bilinear 

Subset 

(Rao) 


SETAR 

AS7133 

(Tong) 


ASTAR 

Model 

GCV9-322 


ASTAR 

Model 

GCV9-1028 




199.3 


124.3 


152.3 


155.5 


149.9 


Number of 
Parameters 


10 


11 


17 


8 


11 




190.9 


123.8 


161.9 


158.3 


205.1 




414.8 


337.6 


362.6 


333.4 


425.0 




652.1 


569.7 


593.2 


515.3 


472.2 




797.3 


621.3 


650.1 


449.1 


416.6 




770.8 


718.4 


613.2 


404.2 


402.6 




786.4 


732.4 


584.8 


377.7 


384.0 




789.0 


781.7 


508.1 


373.4 


378.6 


a^{8) 


827.8 


833.2 


531.8 


372.8 


391.0 




862.1 


900.6 


518.8 


319.1 


389.8 


d?(10) 


895.6 


961.9 


520.9 


302.7 


379.5 




982.9 


1013.8 


563.0 


297.3 


371.0 




1168.5 


1139.2 


650.5 


361.9 


419.1 
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predictive mean sum of squares error for the ASTAR and SETAR models decrease after 
several forward steps before increasing again. 

Moeanaddin (1989) used the AR, Bilinear and SETAR models for ‘risky’ 
prediction for the roughly 2 yearly sunspot cycle period from 1956-1979. This prediction 
period is ‘risky’ because it includes an ‘abnormal’ jump in the yearly sunspot numbers from 
38.0 in 1955 to 141.7 in 1956. The forward-step PMSE’s of the SETAR models are slightly 
better than the MSS’s of the ASTAR models for this period, although the potential of the 
ASTAR models developed in this chapter were not fuUy explored. However, the bilinear 
model’s predictive performance is rather explosive. Moeanaddin (1989) indicates that the 
collapse of the bilinear models prediction may be due to its non-invertibility and the effect 
of the influential observation in 1956. 

G. SUMMARY 

MARS is a new nonparametric regression modeling methodology, due to Friedman, 
that utilizes low-order regression spline modeling and a modified recursive partitioning 
strategy to exploit the localized low- dimensional behavior of the data used to construct 
/(x). Although MARS is a computationally intensive regression methodology, it provides a 
systematic methodology for deriving nonlinear threshold models for high-dimensional data 
that are naturally continuous in the domain of the predictor variables, and can have multiple 
partitions and predictor variable interactions. 

By letting the predictor variables in MARS be lagged values of a time series, one 
obtains an adaptive spline threshold autoregressive (ASTAR) model, which is a new method 
for nonlinear modeling of time series that extends the threshold autoregression methodology 
developed by Tong (1985). A significant feature of ASTAR when modeling time series data 
with periodic behavior is its ability to produce continuous models for the regression function 
with underlying sustained oscillations (limit cycles). An initial analysis of the yearly Wolf 
sunspot numbers (1700-1890) and (1700-1920) using ASTAR produced several models with 
underlying limit cycles. When used to predict the yearly sunspot numbers (1921-1955), the 
ASTAR models are a significant improvement over existing Threshold and Bilinear models. 

An important aspect of any overall regression modeling effort is the interpretation 
and analysis that answers questions about the model’s behavior and reveals relationships 
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between the response variable (output) and predictor variables (input). However, the func- 
tional form of an ASTAR model, with its combination of different predictor variables and 
multiple threshold values, makes its straightforward interpretation and analysis difficult. 
In this regard a graphical representation was developed to permit the interpretation and 
analysis of ASTAR Model 9 of the Wolf sunspot numbers. Further enhancements are ob- 
tained by integrating color in the graphical representation. It was shown that this graphical 
representation can be used to analyze the use for and contribution of each of the terms in 
an ASTAR model. 
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m. SEMI-MULTIVARIATE NONLINEAR MODELING OF TIME 
SERIES SYSTEMS USING MULTIVARIATE ADAPTIVE 
REGRESSION SPLINES (MARS) 



A. INTRODUCTION 

While ASTAR models of univariate time series certainly have widespread applicability, 
the identification of semi-multivariate threshold autoregressive models that consider the 
complex interactions within a time series system would have even greater applicability. In 
this chapter the ASTAR methodology developed in Chapter II is extended to the semi- 
multivariate ASTAR modeling of a time series system. This builds upon semi-multivariate 
threshold autoregressive (TAR) modeling by Tong et al. (1985). Thus MARS is used 
to model a single response variable of a time series system using predictor variables that 
are the lagged values of both the response and input time series. For example, for r = 
1,2, ...A, let {Yr} and {Zr} be time series that represent system inputs and {Xr} be a 
times series representing the system output. The set of possible predictor variables for this 
semi-multivariate time series system are Xr-i , • . • , Xr-di 5 It? • • • ? Yr-d 2 ^tnd . . . , Zr-d ^ ? 
where the maximum lags di, ^2 a,nd are not necessarily equivalent. Also, d\ -1- (^2 + 1) + 
(da + 1) = p, the total number of predictor variables. K MARS is applied to this system of 
predictor variables the result is a semi-multivariate ASTAR model that seems well suited 
for taking into account the complex interactions among the multivariate, cross- correlated, 
lagged predictor variables of a time series system. The analysis of an Icelandic river using 
past riverflow, temperature and precipitation to develop semi-multivariate ASTAR models 
extends earlier TAR modeling of this Icelandic riverflow. Note that the same problem for 
normal multivariate linear time series processes such as ARM A models may be treated by 
Kalman filtering (see, e.g., Gelb, 1974). However, here we are not concerned with complete 
multivariate models, in the sense of Box and Tiao (1977) and Tiduo and Tsay (1989). 
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B. SEMI-MULTIVARIATE NONLINEAR TIME SERIES MODELING 



USING MARS 

There are numerous semi-multivariate times series systems that appear well suited 
for analysis using the MARS methodology, such as sea surface temperatures using lagged 
temperature, surface winds and time as predictor variables (Breaker and Lewis, 1985; Alt- 
man, 1987) or riverflow using lagged river flow, temperature and precipitation as predictor 
variables (Gudmundsson, 1970; Tong et al., 1985). One possible source of nonlinearity in 
the riverflow system might occur due to the change in temperature above and below freez- 
ing. Below freezing, precipitation (snow) does not ‘runoff’ as rapidly as precipitation (rain) 
at higher temperatures. Other applications exist for any multivariate times series system 
with suspected nonlinear behavior, if the objective is to model a single output stream given 
multiple input streams to the system. In particular in Chapter IV, a series of sea surface 
temperatures wiU be analyzed. What is of more interest, as noted above, is to model the 
current sea surface temperatures as a function of lagged sea surface temperatures, lagged 
wind shear (wind velocity squared) and lagged wind direction. 

To provide a framework for the semi-multivariate time series model, suppose that for 
T = l,2,...,jV, {Yr} and {Zr} denote the input time series and {Xr} the output time 
series for a time series system we wish to model. The complete description for the general 
form of a semi-multivariate time series model is very complex. However, using the notation 
II (from Tong, 1985) to separate the possible predictor variables of each different time series 
and following (1), we can nominally describe Xr with the semi-miiltivariate time series 
regression model 



Xr — f {1 \\ Xr~ljXr-2^ • • • ^Xr^di || Vr-l , . . . , IV-dj \\ Zr ^ Zr^l ^ Z^-dz) + ^r»(45) 



where /(•) represents some functional form of its argument, 1 denotes a model constant, 
and the maximum lags and da are not necessarily equivalent. Also, Yr and the 

current values of the predictive time series, may or may not be included in (45), depending 
on the time series system and the use for which the model is to be put. Generally, prediction 
of Xr at r would preferably be done without the knowledge of Yr and Zr- This is because 
if Xr is measurable, it will generally be known only when Yr and Zr are finally known. 



67 



Both Tong (1985) and Tsay (1989) suggest a methodology for semi-multivariate TAR 
modeling that follows the TAR methodology for a univariate time series, i.e., identification 
of linear semi-multivariate autoregressive time series models in each disjoint subregion of 
the predictor variable space. For example, their notation for a very simple two-subregion 
semi-multivariate TAR model based on a single partition in the space of all the predictor 
variables at, say, Zr = 3 is 



(0.5 II 1.1 II -2.7, 1.1 II 4.3, -2.8) if Zr < 3 



Xr={ 



(2.3 II 0.1, -0.2 II 1.7 II -0.1, 2.1) if Zr > 3, 



which represents the model 



1 0.5+l.lXr-i-2.7n + l.iyr-i+4.3Zr-2.8Z^.i if Z^ < 3 

(46) 

2.3 + 0.1Xr_l - 0.2Xr-2 + ^-7Yr - O.lZr + 2.1Zr-l if Zr > 3. 

The semi-multivariate TAR methodologies of Tong (1985) and Tsay (1989) focus on uni- 
variate and bivariate scatterplot analysis and on the evaluation of empirical percentiles of 
preselected threshold variable candidates. These methods are also permitted with MARS. 
However, the predictor variables of a time series system may possess physical behavior not 
readily apparent when we restrict our modeling methodology to the above approach. The 
key point is that Tong’s and Tsay’s methods are time consuming, generally limited to one 
or two dimensions and may not be sufficient for identifying changes in the physical behavior 
of a nonlinear time series system. Thus, a semi-multivariate TAR model is still burdened 
with the limitations of a univariate TAR model, i.e., a threshold model created with the 
piecewise linear models from each disjoint subregion of a domain D of the predictor vari- 
ables. Also the TAR model is usually discontinuous at each subregion boundary (threshold) 
and is limited to a small number of thresholds, most often using only one variable, due to 
the difficulties associated with the threshold selection process. 

The MARS methodology supplements Tong’s (1985) and Tsay’s (1989) approach by 
admitting a more general class of continuous nonlinear semi-multivariate threshold models 
than permitted with the semi-multivariate TAR methodology, and by providing a more 
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systematic (automatic) way of fitting the model. The methodology for developing this clciss 
of nonlinear semi-multivariate threshold models is called SMASTAR (Semi-Multivariate 
Adaptive Spline Threshold Autoregression). Following Chapter II, the fact that one obtains 
from the MARS algorithm a more general cla^s of continuous nonlinear semi-multivariate 
threshold models than permitted with semi-multivariate TAR methodology (for example a 
model such as (46)) can be shown using a simple example. 

Let Xt be a time series we wish to model with the lagged predictor variables Xt-i, 
Xt_ 2 , Fr-ij Ft_ 2 , Zt -1 and Zr- 2 - Also, let the notation (f/—t)* represent (t—I7)+ and (f/ — 
t)+ where («)+ = « if « > 0 and 0 otherwise. Extending the example for the ASTAR time 
series model developed in Chapter II, each forward step of the MARS algorithm selects one 
and only one set of new terms for the SMASTAR time series model from the candidates 
specified by previously selected terms of the model. For our example problem the sets of 
candidates in the initial forward step of the MARS algorithm are 



or or 

or (n-j-gf or 

or (Z..J-I;)*, (47) 

where t^,ty and t* are unknown partition points (thresholds) in the range of their respective 
lagged predictor variable. For our example problem, assume that the MARS algorithm se- 
lects the lagged predictor variable Xr -2 with threshold value t* = t\ , i.e., (Xt -2 — ^i)-h and 
(ti — Xr- 2 )+ are the initial terms (other than the constant) in the SMASTAR time series 
model. The sets of candidates in the second forward step of the MARS algorithm includes 
all univariate candidates in (47) and the new sets of multivariate candidates (interactions): 



{Xr-l-t:)iiXr-2-tl)+, 
iYr.l-t;)^iXr.2-h)+, 
{Yr-2 - t;)i(Xr-2 - hU, 
(Z,_i-t:)±(X,_2-tl)+, 
{Zr.2-t:)i(Xr-2-hU, 



or (X._i-C)*(ti-X,_ 2 )+, 
or (yr-l-t;)^(<l-X,_ 2 )+, 
or iYr- 2 -tl)tih-Xr. 2 )+. 
or ( Zt-1 — ^i)+(^l ~ Xt-2)+, 

or ( Zr-2 - **)+(<! - -X't-2)-!-, 



or 

or 

or 

or 



(48) 
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due to the initial selection of (Xt-j — ti)+ and (<i —Xr-2)+ as terms in the SMASTAR time 
series model. One and only one of the terms from (47) or (48) is selected for inclusion in 
the model in the next forward step of the MARS algorithm. It follows that SMASTAR time 
series models could have multiple thresholds on one variable, say Xr -2 in our example, by 
again selecting {Xr -2 — tl)+ in (47) for some new partition point t* ^ ti. The forward-step 
algorithm continues at each step by selecting the set of univariate or multivariate terms 
that, for a given threshold ti,t* or t* discovered using exhaustive search, most contributes 
to “improving” model fit. The sets of candidates for each subsequent forward step of the 
SMASTAR algorithm is nondecreasing in size and is bzised on previously selected terms of 
the model. As discussed in Chapter II the forward-step algorithm is followed by a backward- 
step algorithm that trims excess terms of the model that no longer sufficiently contribute to 
the model fit. And again, both the forward and backward steps of the algorithm use GCV* 
(23) to evaluate model fit versus model complexity (Chapter V discusses alternative model 
selection criteria). 

Again, for r = 1,2,..., A, let {1^} and {Zr} denote the input time series and {A^} 
the output time series for a time series system that we wish to model. Let the p predictor 
variables in MARS for the rth value in a time series {At} be; At_i, At_ 2 , • • •» Xr-di^ 
Yt-i, . . ., and Zt, Zt_i, . Zt-Jj, which we represent as A^Lj, and 

respectively. Following (30), the functional form of the SMASTAR model that estimates 
At is 

X, = (49) 

J=1 

SO that At is an additive function of the product spline b 2 isis functions 
|Aj(A^Li5^T*^*> associated with the subregions As with the AS- 

TAR time series model (31), the functional form of the SMASTAR time series model may 
be expanded using the ordered sequences of truncated spline functions (20 and 21) that 
define each product spline basis function. 

Let a and b be dummy variables that index the ordered sequence of truncated spline 
functions |ra,rj,(A^l_jy'!^*‘''^, Z^*‘*’*)| such that 0 < a < b < j. Also to account for the 
additional complexity of a multivariate time series system let Vj = (±v,t,l) represent a 3- 
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tuple associated with the truncated spline function Ta^rbi^r-i) whose components identify; 
±, the direction of the truncated spline (left or right); u, the specific predictor variable; 
the partition point; and /, the input time series used as predictor variables. Given this 
additional notation the functional form of the SMASTAR time series model for the rth 
value in a time series {At} using this expansion is 

5 

= n N"v(^t-v-0]+ (50) 

where the argument of Ta,r,(Xih,Yi^+\ and 

is suppressed for simplicity. Again, note that the truncated spline 
functions act in only one dimension although their argument is a vector of predictor vari- 
ables. 

By modding a time series system using the MARS algorithm, we overcome some of 
the limitations of the semi-multivariate TAR modeling approach. The MARS methodology 
provides a systematic procedure for deriving a nonlinear semi-multivariate time series model 
that is naturally continuous in the domain of the predictor variables. As shown in Chapter II 
and later in Chapter IV with the yearly Wolf sunspot numbers and Granite Canyon data 
sets, ASTAR models of univariate time series can possess multiple thresholds and high level 
predictor variable interactions. This construction has now been extended to the multivariate 
setting with SMASTAR models, which can also possess multiple thresholds and high level 
predictor variable interactions. However, now the threshold values and predictor variable 
interactions can take place among the cross-correlated, lagged predictor variables of a semi- 
multivariate time series system. In contrast, the semi-multivariate TAR methodologies of 
Tong (1985) and Tsay (1989) focus on scatterplot analysis and the evaluation of empirical 
percentiles of preselected threshold variable candidates, which is time consuming and may 
not be sufficient for identifying changes in the physical behavior of a nonlinear time series 
system. Also the semi-multivariate TAR model is still burdened with the limitations of 
a univariate TAR model, i.e., a discontinuous threshold model created with the piecewise 
linear models from several disjoint subregions of a domain D of the predictor variables. 
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!• Semi-Multivariate Non Linear Threshold Modeling of the Vatnsdalsa River 

As an illustration of SMASTAR’s ability to model an actual semi-multivariate 
times series system, the riverflow (Tong, 1985) for the Vatnsdalsa River in Iceland from 
1972 to 1974 is analyzed. A riverflow at a given location and time is an output of a complex 
time series system with inputs that include aspects of the geography, geology, meteorology 
and topography within the river’s region of flow. Extensive literature is available on the 
modeling complexities of riverflow and will not be revisited here other than to state that 
the use and control of riverflow is of great concern in many countries of the world. Also, 
riverflow data generally has a very non-normal distribution^ in part because of the nonlinear 
seasonal variations of the system variables and in part because of the difficulty in capturing 
all of the influential variables within the modeling effort. Lawrance and Kottegoda (1977) 
provide an excellent historical review of statistical hydrology and discuss stochastic modeling 
of riverflow with the goal that “ ... models should be able to reproduce, in simulation, 
sequences of flows or lake levels or rainfalls, which are statistically indistinguishable from 
the relevant historical sequence.” This prescription in essence permits prediction and the 
study of physical changes that can affect the hydrological system, e.g., a dam in the case of 
a riverflow system. 

The Vatnsdalsa riverflow data. Figure 21, consists of the river’s average rate 
of daily flow (Xt) in m^fsec.^ the daily precipitation (Yi) in mm, and the average daily 
temperature (Z<) in at the Hveravellir meteorolopcal station in Iceland for the period 
from 1972 to 1974. The range of values for daily riverflow for this period is 3.67 to 54.0 
m^/sec., with a mean value of 8.94 m^/sec.; the range of values for daily precipitation 
for this period is 0.0 to 79.3 mm, with a mean of 2.51 mm; the range of values for daily 
temperature for this period is —22.4 to 13.9 with a mean value of —A4^C. Both the 
riverflow and temperature are highly autocorrelated times series with lag 1 correlations of 
.92 and .90 respectively. The precipitation record is actually translated forward by one day 
due to the difference in the time during the day for recording the precipitation data and 
the time during the day for recording the temperature and riverflow data. An extensive 
discussion of the Vatnsdalsa riverflow system is provided by Gudmundsson (1970) and Tong 
et al. (1985). 
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Vatnsdalsa River Data (1972-1974) 



Figure 21. 




(/•O/gUi) (D«) (uiui) 



The record of daily Vatnsdalsa riverflow, temperature and precipitation for 1972 
to 1974 taken at the Hveravellir meteorological station in Iceland for the period 
from 1972 to 1974. The range of values for daily riverflow for this period is 3.67 
to 54.0 m^/sec., with a mean value of 8.94 m^/sec.; the range of values for daily 
precipitation for this period is 0.0 to 79.3 mm, with a mean of 2.51 mm; the range 
of values for daily temperature for this period is —22.4 to 13.9 with a mean value 
of — .44^C. The precipitation record is actually translated forward by one day due 
to the difference in the time during the day for recording the precipitation data and 
the time during the day for recording the temperature and riverflow data. 
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Although the length of the riverflow data is relatively short, 1096 days, it does 
provide a basis for SMASTAR model development and analysis. The primary interest for 
modeling the Vatnsdalsa riverflow with MARS was to determine if MARS could produce 
reasonable SMASTAR models in a semi-multivariate time series setting, and if so, could the 
SMASTAR model identify reasonable nonlinear structure in the riverflow data, e.g., changes 
to riverflow due to temperatures above and below freezing? Finally, could the SMASTAR 
model be used for prediction of riverflow one day ahead? 

Graphical analysis of the riverflow data. Figure 21, reveals an extremely high 
riverflow that occurs each spring. Precipitation that falls in the form of snowfall during 
the winter accumulates until the temperature rises sufficiently in the spring to release it 
to the riverflow system. Note that the high riverflow corresponds to the general rise in 
temperature during the early months of each year. Also, a considerable shift in the overall 
riverflow occurs in 1974 that is not evident during the previous two years. The spring 
riverflow during 1974 is higher and of longer duration than the spring riverflows for the 
previous two years. This surge gives way to the extremely low riverflow in the latter half of 
1974, that again is not characteristic of the same period riverflow for the previous two years. 
These severe changes in the riverflow structure for 1974 can be attributed to a combination 
of extremely high rainfall and the rapid warming of the snowpack that occurred earlier in 
1974 than in the previous two years. 

The empirical density functions for the riverflow [top], temperature [middle) and 
precipitation [bottom] data are shown in Figure 22. This figure should be interpreted with 
the understanding that the data is clearly seasonal. The y-axis scale (density) of the plots 
are equal while the x-axis is scaled for each time series and reflects the range of each 
time series for the years 1972-1974. The empirical density function of the temperature is 
relatively symmetric while the empirical density function of the riverflow and temperature 
data are extremely skewed. The skewness in the precipitation data is a result of the heavy 
but infrequent precipitation that occurs each year. The skewness in the riverflow data can 
be attributed to the high riverflow that occurs each spring. The skewed distributions of 
these data sets suggest the possible use of transformations for symmetry (normality) and 
variance stabilization to moderate the influence of the extreme values. Transformations 
were considered for the precipitation data. However, riverflow is the output stream that we 
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are modeling. Therefore, we chose to deal directly with this data to avoid the difficulties 
associated with inverse transformations for purposes of analysis and prediction. 

Many other modeling methodologies could be used for modeling this type of 
semi-multivariate hydrological data. One method, previously discussed, is to develop semi- 
multivariate TAR models for various regions of the predictor variable space. The semi- 
multivariate TAR modeling effort of the Vatnsdalsa River data is briefly discussed in the 
next section. Other methods consider models using a fixed signal with noise. However, 
as with the yearly Wolf sunspot numbers, attempts to model the data with a fixed cycle 
period signal plus (possibly correlated) noise have failed because the cyclical component in 
the spectrum for this riverflow system is quite spread out. In particular, using Figure 21, 
note the size and shift in time of the Vatnsdalsa’s riverflow that takes place in the spring 
of 1974, as compared to the Vatnsdalsa’s riverflow during the spring for the previous two 
years. 

a. TAR and SMASTAR Models of the Vatnsdalsa River (1972-1974) 

Tong et aJ. (1985) considered a series of semi-multivariate linear and TAR 
time series models for the Vatnsdalsa riverflow data (1972-1974). Their goal, to develop 
nonlinear models for purposes of simulation along with establishing relationships between 
riverflow and important meteorolopcal variables met with limited success due to the lim- 
itations of the TAR methodology. Also, the TAR models included Yr and Zr, i.e., same 
day precipitation and temperature. Although a model that includes Yr and Zr does permit 
analysis of the “immediate” influence of temperature and precipitation on riverflow, it also 
essentially prohibits the use of the model for riverflow prediction. Several semi -multivariate 
time series models from Tong et al. (1985) are of interest. 

The first model from Tong et al. (1985), Tong Model 1 of the Vatnsdalsa 
river system, shown at Figure 23, is the ordinary semi-multivariate linear time series model 
for riverflow during 1972 with only precipitation and temperature as the system inputs, i.e., 
without lagged riverflow as a model predictor variable. Tong Model 1 is 

Xr = 9.40 + O.llYr -I- O.llYr-i - Q.OlZr -f Cr, (51) 
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Riverflow (mVeec) 





Figure 22. The empirical density functions of the riverflow, temperature and rainfall data for the 
Vatnsdalsa riverflow system for 1972 to 1974 taken at the Hveravellir meteorological 
station in Iceland. The y-axis scale (density) of the plots are equal while the x-axis is 
scaled for each time series and reflects the range of each time series for the years 1972- 
1974. The empirical density function of the temperature is relatively symmetric. In 
contrast, the empirical density functions of the riverflow and temperature data are 
extremely skewed. The skewness in the precipitation data is a result of the heavy but 
infrequent precipitation that occurs each yeair, while the skewness in the riverflow 
data can be attributed to the high riverflow that occurs each spring. This figure 
should be interpreted with the understanding that the data is clearly seasonal. The 
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where c is assumed to be Gaussian white noise and the standard error of the fitted residuals 
Cl = 4.64. The poor quality of this linear model is revealed by its inability to capture the 
sharp structure of high riverflow during the spring runoff. Also from Tong et al. (1985), 
the magnitude of Ci is larger than the average value of Xr and it is difficult to explain the 
rational for the negative coefficient on the precipitation variable Zt- The shortcomings of 
this model indicate the importance of lagged riverflow to help capture the structure of the 
riverflow system. 

The second model from Tong et al. (1985), Tong Model 2 of the Vatnsdalsa 
river system, shown at Figure 24, is the ordinary semi-multivariate linear time series model 
for riverflow during 1972 with precipitation, temperature and riverflow as the system inputs. 
Tong Model 2 is 

Xr = .73 -I- 1.12Xr-i - 0.23Xr-2 + 0.12X^_3 - 0.09Ar_4 

-I- o.oflTr - o.oan-i - o.o4y^_2 ( 52 ) 

+ O.Ol^r "b 0.07 Zr —1 ~ 0.06Zr—2 "b 0.02 Zt— 3 + Ct, 

where 6 is assumed to be Gaussian white noise and the standard error of the fitted residuals 
Cl = 1.68. To simplify the presentation of more complex semi-multivariate models that 
follow, using notation from Tong (1985), Tong Model 2 may be rewritten as 

Xt= (.73 111.12,-0.23,0.12,-0.09 
II 0.09,-0.03,-0.04 
110.01,0.07,-0.06,0.02) + €r 

where || is used to separate the coefficients of the lagged predictor variables from the different 
time series. The fitted values and residuals of Tong Model 2, shown in Figure 24, are a 
considerable improvement over those for Tong Model 1, shown in Figure 23. The analysis 
of Tong Model 2, using equation (52), reveals the immediate and lagged influence of all 
three different predictor variables. Also, in the absence of present and lagged rainfall and 
assuming that present and lagged temperatures are close to 0°C, i.e., Yr,Yr-i and Yr- 2 , and 
Zr,Zr-i, Zr- 2 , and Zr -3 = 0, this model’s riverflow reaches a steady state flow of about 
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Figure 23. Vatnsdalsa riverflow data during 1972 versus the fitted (predicted) values (top) and 
residuals (bottom) for the ordinary semi-multivariate linear time series model, Tong 
Model 1, from Tong et al. (1985). Tong Model 1 uses precipitation Yj-ii and tem- 
perature Zr-t as system inputs i.e., lagged riverflow Xr-t is not used as a predictor 
variable. The standard error of the fitted residuals, <Tc = 4.64, is larger than the 
average value of the riverflow. The poor quality of this model’s fit is revealed by its 
inability to capture the sharp structure of high riverflow during the spring runoff. 
This model reveals the importance of lagged riverflow to help capture the structure 
of the riverflow system. 
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9m^/sec., which is reasonable based on the analysis of the riverflow data in Figure 21. 
However, there are still several instances when the fitted values of Model 2 deviate from 
the structure of the actual riverflow (the fitted values actually indicate negative riverflow). 
These deviations occur most often during the period when the temperature is rapidly rising 
during early spring and thus is indicative of the nonlinear relationships that exist among 
the predictor variables in this time series system. 

In response to the difflculties of the ordinary semi-multivariate linear time 
series models, Tong et al. (1985) proposes several semi-multivariate TAR models for the 
Vatnsdalsa riverflow system. The methodology for developing a semi-multivariate TAR 
models was discussed at the beginning of Section B. The progressive use of this methodology 
resulted in a final semi-multivariate TAR model from Tong et al. (1985), Tong Model 5 of 
the Vatnsdalsa river system, using lagged riverflow . . ,Xt-io, lagged temperature 

Yr , . . . , Fr-io> and lagged precipitation Z^,. . .,Zt_io» as the predictor variables. Using the 
notation developed from (46) and (52), Tong Model 5 for the period 1972 to 1974 is, 

1.06,-0.26,0.09,-0.11,0.08 

0.02,-0.03,0.01,-0.02 if Z,. < -2, 

- 0 . 02 , - 0 . 01 , - 0 . 00 , 0 . 01 ) + 4 

0.97,-0.29,0.04,0.11 

0.53,0.02,-0.02 if - 2 < Z,. < 2, 

0.03,0.12,-0.04,-0.02) -I- 4 

1.38,-0.70,0.47,0.02,-0.19,-0.02,0.34,-0.23 
-0.59,0.07,-0.11,-0.05,0.07,0.13,-0.25 if 2 < Z^ < 5, 

0.03, 0.03, -0.01, 0.04, -0.03, -0.04, 0.13, 0.01) -t- 4 

1.22,-0.49,0.30,-0.17,0.27,-0.26,0.11 

-0.02,-0.02,0.01,-0.01,0.02,-0.04 if Zr > 5, (53) 

0 . 01 , 0 . 01 , - 0 . 01 , - 0 . 01 , 0 . 01 ,- 0 . 02 ) + 4 

where each regions errors, are assumed to be Gaussian white noise sequences that 

are independent of each other and where the standard error of the pooled fitted residuals 
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Figure 24. Vatnsdalsa riverflow data during 1972 versus the fitted (predicted) values (top) and 
residuals (bottom) for the ordinary semi-multivariate linear time series model, Tong 
Model 2, from Tong et al. (1985), using precipitation VV-i, temperature Zr^i and 
riverflow Xr-i as system inputs. The standard error of the fitted residuals = 1.68. 
The fit of this model is a considerable improvement over Tong Model 1, Figure 23. 
However, there are still several instances when the fitted values of the model deviate 
from the structure of the actual riverflow (the fitted values actually indicate negative 
riverflow). This occurs most often during the period when the temperature is rapidly 
rising during early spring and is indicative of the nonlinear relationships that exist 
among the predictor variables in this time series system. 
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is ( 7 c = 1.69m^/5ec. Tong Model 5 uses 73 parameters in four disjoint subregions of the 
predictor variables that are described by the univariate thresholds, —2,2 and 5 on the 
lag 0 temperature variable Zj, 

Figure 25 is a plot of the fitted values (top) and residuals (bottom) for 
Tong Model 5 of the Vatnsdalsa River system. Tong Model 5 appears to capture the 
overall structure of the riverflow data within each disjoint subregion described by the model. 
Figure 26 contains plots of the autocorrelation [top] and normalized cumulative periodogram 
[bottom] of the fitted residuals from the second subregion of Tong Model 5 (—2 < Zj < 2). 
If the fitted residuals of Tong Model 5 are truly independent then the fitted residuals in each 
of the four model subregions should also be independent. The fitted residual autocorrelation 
plot, with approximate individual 95% confidence intervals for zero correlation, shows that 
significant short term residual autocorrelation stiU exists. Also, the normalized cumulative 
periodogram plot, with a reference line for the normalized spectrum of Gaussian white noise 
and 90% Kolmogorov-Smirnov (K-S) bounds, shows that we should reject the hypothesis 
that the fitted residuals from the second subregion are Gaussian white noise. Note that 
these results are similar for the other subregions of the model. 

In summary, although Tong Model 5 appears to capture the overall struc- 
ture of the riverflow data, this and other semi-multivariate TAR models of the Vatnsdalsa 
riverflow system were unable to model the data in such a way as to produce riverflow data 
with Gaussian or even uncorrelated residuals. This in conjunction with the enormous size 
of this semi-multivariate TAR model (73 parameters) may reflect the inability of the TAR 
methodology to capture the complex predictor variable interactions present in this river- 
flow system. Note that the maximum lag of a time series used for predictor variables in a 
semi-multivariate TAR model, e.g., equation (53), may be different from subregion to sub- 
region. However, within a subregion the semi-multivariate TAR model is of full size, i.e., all 
autoregressive terms for each input time series up to the maximum lag are included. Thus 
there is no subset selection of the predictor variables in the semi-multivariate TAR model. 
In contrast, the SMASTAR methodology permits subset selection of the lagged predictor 
variables used from each input time series. 

Given the preliminary analysis of the Vatnsdalsa riverflow system and the 
semi-multivariate TAR modeling effort, the MARS algorithm was used to develop SMAS- 
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Figure 25, Vatnsdalsa riverflow data for the period 1972-1974 versus the fitted (predicted) val- 
ues (top) and residuals (bottom) for the final semi-multivariate TAR model, Tong 
Model 5, from Tong et al. (1985). The semi-multivariate TAR model for the river- 
flow at time r, Xt, is a function of lagged riverflow Xr-j for j = 1, . . . , 10, and 
precipitation Yr^ij and temperature Zr-i for i = 0,...,10. The final model con- 
tains 73 parameters in 4 disjoint subregions that are described by the 3 temperature 
thresholds on Zr at -2, 2 and 5 The standard error of the pooled fitted residuals 
is 1.69 tn^/stc. The use of Yr and Zr in the TAR model essentially prohibits the 
use of the model for prediction. 
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Figure 26. Fitted Residual Plots from Tong Model 5. The autocorrelation function (first 20 
lags) and the normalized cumulative periodogram of the fitted residuals from the 
second subregion, — 2 < < 2, of Tong Model 5 from Tong et al. (1985) of the 

Vatnsdalsa River system for the period 1972-1974. If the fitted residuals of the 
model are truly Gaussian white noise then the fitted residuals from each subregion 
should also be independent. Although the fitted values of the model, Figure 25, 
appear to capture the overall structure of the riverflow data, the approximate 95% 
individual confidence bounds show that some short term residual autocorrelation in 
the second subregion still exists. Also, the cumulative normalized spectrum of the 
fitted residuals falls outside the 90% K-S bounds for Gaussian white noise indicating 
that we should reject the hypothesis that the fitted residuals are Gaussian white 
noise. Note that the results of the fitted residual analysis from the other subregions 
of Tong Model 5 are similar. 
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TAR models for the Vatnsdalsa riverflow from 1972-1974 using 20 predictor variables; lagged 
riverflow Xr-i to Xt-s, lagged precipitation Yt to Yt- 7 , with and without the natural 
log transformation y/_,- = /n (1 + Yr-i), lagged temperature Zr to Zr-s, and a variable for 
time of year effect. The models were initialized using 9 data values for each of the input 
time series. The different models obtained occurred because of changes made to the user pa- 
rameters of the MARS algorithm. These parameters include: MI = 3 and 4, the maximum 
level of lagged predictor variable interaction; MS = 10, 15 and 20, the minimum separation 
of a lagged predictor variable’s partition points; and M = 30, the number of steps during 
the forward-step algorithm. The SMASTAR models were identified with the SMASTAR 
version of MARS 2.0 installed on an IBM3033 Computer using VS Fortran. Each of the 
3-year models required from 1 to 2 minutes of CPU time. Also, the maximum lags of each 
predictor variable time series were chosen because of predictor variable constraints within 
the SMASTAR version of MARS 2.0. More predictor variables could have been modeled 
using the adjustments to MARS 3.0 that will be discussed in Chapter IV. 

The results of the modeling effort indicate that the SMASTAR methodol- 
ogy appears well suited for analysis of semi-multivariate time series systems. Although it 
will be discussed in more detail in the next section, 2 and 3 year SMASTAR time series 
model terms appear to provide an indication of the underlying physical structure of the 
riverflow system. Throughout the modeling effort it was interesting to note that although 
the time variable was included as a predictor variable it was never selected as a final model 
term. This in effect, implies that for this riverflow system and data, the lagged predictor 
variables have captured the relevant time dependent structure of the riverflow. Also, while 
the (riverflow / precipitation variables) and (riverflow / temperature variables) frequently 
developed interaction terms in the models, there were few direct interactions between the 
temperature and precipitation variables. The SMASTAR models developed with the natu- 
ral log transformation y/_,- = ln{l Yr-i) and a maximum level of interaction of MI = 3, 
appeared more stable than models developed without the transformation and with MI = 4. 
As expected, SMASTAR models of the Vatnsdalsa river system are relatively complex when 
compared to the ASTAR models developed for the yearly Wolf sunspot numbers in Chapter 
II because interaction terms between cross-correlated predictor variables are permitted. 
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Figure 27 shows the fitted values and residuals of SMASTAR Model ICE796 
of the Vatnsdalsa riverflow for the three years 1972 to 1974. Model ICE796 was selected as 
a result of the overall fit of the model along with the analysis of its fitted residuals. Model 
ICE796 was developed using a natural log transformation of the precipitation predictor 
variable and was permitted to form 1, 2, and 3- way interactions during a maximum of 
M = 30 forward steps of the forward step MARS algorithm. The minimum span between 
threshold values for a single predictor variable was 15 data values. The model has 37 
parameters that include 24 model terms (a constant term and 3 one-way, 8 two-way and 12 
three-way interactions) and 13 threshold values (2 for Xr-i, 1 for Xr- 3 ‘, 2 for Fr-i, 1 for 
Yt- 2 i 1 for Vt-Sj 2 for yr-6> 1 for 1 for Zr-i, 1 for Zr-3, and 1 for Zr-e). It can be 
seen that Model ICE796 captures the overall structure of the riverflow data. The standard 
error of the fitted residuals is <Tt = 1.39m^/sec. 

Analysis of the fitted residuals from this model, Figure 28, shows that no 
short term residual autocorrelation exists in contrast to the short term residual autocor- 
relation that was present in the TAR models. Also, the residuals could be considered 
independent if they were normally distributed because the normalized cumulative spectrum 
of the fitted residuals falls entirely within the 90% K-S bounds from the cumulative spec- 
trum for Guassian white noise. However, the model residuals still appear non-Gaussian 
with extremely heavy tails that can be expected with this type riverflow data (Figure not 
shown). Note that SMASTAR Model ICE796 (37 parameters, = 1.39m^/sec.), Figure 
27, has fewer parameters and smaller fitted residual variance than Tong’s TAR Model 5 (73 
parameters, = 1.69m^/sec.), Figure 25, and also appears to better capture the structure 
of the periods of high riverflow that occur each spring. Note that the vertical scales of the 
plots in Figures 25 and 27 are the same. 

b. Two Year SMASTAR Models of the Vatnsdalsa River for Prediction 

In the previous section we discussed the development of TAR and SMASTAR 
time series models for three years of the Vatnsdalsa riverflow. However, these TAR and 
SMASTAR models included same day precipitation and temperature predictor variables, 
Yr and Z^, which essentially prohibits the use of the models for riverflow prediction. In this 
section our objective is prediction; MARS was used to develop SMASTAR models for only 
731 days of riverflow and the remaining 355 days were used for prediction. It is unknown if 



85 



(m’/DAY) (mVDAY) 



Vatnsdalsa River Data (1972-1974) 



o River Plow 
Model 




Figure 27. Vatnsdalsa riverflow data versus the fit (top) and the residuals (bottom) for SMAS- 
TAR Model ICE796. The period of the modeling effort is 1972 to 1974. The SMAS- 
TAR model for the riverflow at time r, Xt, is a function of lagged riverflow Xr-i 
for i = 1,...,5, precipitation Yr^j for j = 0,...,7, and temperature Z^-jb for 
k = 0, ...,5 and a variable for time of year effect. The final model contains 37 
parameters that include 24 model terms (a constant term and 3 one-way, 8 two-way 
and 12 three-way interactions) and 13 threshold values (2 for Xr-i, 1 for Xr- 3 ; 2 
for Yr^it 1 for Yr^2) 1 for Vr-5» 2 for Vr-e; 1 for 1 for Z^.i, 1 for Zr-3, and 1 
for Zr-e)* The standard error of the fitted residuals is <T( = 1.39m^/sec. 
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Figure 28, Fitted Residual Plots from SMASTAR Model ICE796. The autocorrelation func- 
tion (first 20 lags) [top] and the normalized cumulative periodogram [bottom] of the 
fitted residuals from SMASTAR Model ICE796 of the Vatnsdalsa River system for 
the period 1972-1974. The autocorrelation plot with approximate 95% individual 
confidence bounds shows that no apparent autocorrelation exists in the fitted resid- 
uals of Model ICE796. Also, the residuals could be considered independent if they 
were normally distributed because the normalized cumulative spectrum of the fitted 
residuals falls entirely within the 90% K-S bounds from the cumulative spectrum 
from Gaussian white noise. 
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TAR models of the Vatnsdalsa riverflow were developed for purposes of prediction to provide 
a comparison of the predictive capabilities between SMASTAR and TAR semi-multivariate 
models. 

The SMASTAR models were developed for the 731 days (2 years) of the 
Vatnsdalsa riverflow during 1972 and 1973 using 20 predictor variables; lagged riverflow 
Xr-i to Xt-5, lagged precipitation Yr-\ to yV-Si with and without the natural log trans- 
formation Y*_i = /n (1 -1- Yr-i), lagged temperature Zr-i to Zt_ 6» and a Vciriable for time 
of year effect. The first 9 data values of each time series were used for initialization. Note 
again that during this modeling effort we excluded Yr and Zr (same day temperature and 
precipitation) from the model to permit riverflow prediction for the last 355 days of river- 
flow during the year 1974. Again different models occurred because of changes made to 
the user parameters in the MARS algorithm. The parameter selections included: MI = 2, 
3 and 4, the maximum level of lagged predictor variable interaction; MS = 10, 15 and 20, 
the minimum separation of a lagged predictor variable’s partition points; and M = 15, the 
number of steps during the forward-step algorithm. 

As with the 3-year modeling effort, the 2-year SMASTAR models appear 
well suited for analysis of semi-multivariate time series systems. Again, the (riverflow / 
precipitation variables) and (riverflow / temperature variables) frequently developed inter- 
action terms in the models, there were few direct interactions between the temperature and 
precipitation variables. Also, the SMASTAR models developed with the natural log trans- 
formation Y“_i = /n(l-|-y'T-«) and amaximum level of interaction of MI = 3, appeared more 
stable (less likely to have abnormal changes in riverflow) than models developed without 
the transformation and with MI = 2 and 4. 

Equation (54) details SMASTAR Model ICE1486 for the Vatnsdalsa riverflow 
for the years 1972 and 1973. SMASTAR Model ICE1486 was selected from among the other 
models due to the overall model fit and the analysis of its fitted residuals. The presentation 
of Model ICE486 in equation (54) is intended to take advantage of the tree-like structure that 
naturally develops as a result of its truncated spline functions and of the stepwise selection 
methodology within MARS. Model ICE1486 for the Vatnsdalsa riverflow was developed 
using the natural log transformed precipitation and was permitted to form 1, 2, and 3- 
way interactions during a maximum of Af = 15 forward steps of the forward step MARS 
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algorithm. The minimum span between threshold values for a single predictor variable was 
MS = 20 data values. Model ICE486 is 

+ 2.34(y;_i - 2.49)+ 

+ 1.20{Xr-i - 3.98)+ 

- .200(X^_i - 3.98)+(.833 - T;_2)+ 

+ .038(X^_i - 3.98)+(.833 - y;_2)+(14.5 - Xr-4)+ 

- .116(A'r-i - 3.98)+(y;_2 - .833)+ 

(54) 

+ .174(X^_i - 3.98)+(7.92 - Xr-2)+ 

- .0U{Xr-i - 3.98)+(X^_2 - 7.92)+(3.2 - Zr-i)+ 

- .021{Xr-i - 3.98)+(X^_2 - 7.92)+(Zr_i - 3.2)+ 

+ .008(Xr-i - 3.98)+(Xr-2 - 7.92)+ (2.4 - Zr-i)+ 

+ .012(Xr-i - 3.98)+(Xr-2 - 7.92)+(Zr-i - 2.4)+ 

+ .008(X^_i - 3.98)+(Xr-2 - 7.92)+(3.3 - Zr-2)+ 

- .005(Xr-l - 3.98)+(Xr_2 - 7.92)+(Zr_2 - 3.3)+. 

Model ICE486 has 21 parameters that includes 13 terms (a model constant 
term and 2 one-way, 3 two-way and 7 three-way interactions) and 8 threshold values (1 each 
on the lagged riverflow predictor variables, X-^-i, Xt- 2 ‘> Xr-i’, lagged transformed precipi- 
tation variables, y,.*_i,y,.*_ 2 ; and the lagged temperature predictor variable, Zt- 2 ^ and 2 on 
the lagged temperature predictor variable, Zr-i). The standard error of the fitted residuals 
for the model is = 1.27m^ fsec. 

Figure 29 shows plots of the fitted values and residuals of Model ICE486 
for the Vatnsdalsa riverflow data during 1972 and 1973. Again, note that the precipitation 
data used in Model ICE486 is the natural log transformed precipitation. Model ICE486 
appears to capture the overall structure of the Vatnsdalsa riverflow. Note also, that the 
minimum riverflow for the modeling period is 3.98 m^/sec., which is higher than the min- 
imum riverflow that occurs during the period we will be using the model for prediction. 
The size of the 2-year Model ICE486 (21 parameters) and the standard error of the fitted 
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residuals <Tt = 1.27m^/sec. when compared to the 3-year Model ICE796 (37 parameters) 
with = 1.39m^fsec and 3-year Tong Model 5 (73 parameters) with = l.Gdm^fsec, 
provide some insight into the change in riverflow structure that occurs between the first two 
years (1972-1973) and the last year (1974). The 3-year models (1972-1974) require many 
more parameters than the 2-year model (1972-1973) to account for the change in riverflow 
structure during 1974. 

Model ICEi486, Figure 29, appears to equally overfit and underfit the peaks 
and troughs as it captures the general structure of the riverflow data. The fitted residuals are 
examined using the normal probability plot (Figure 30) and the autocorrelation function and 
estimated normalized periodogram plots (Figure 31). Analysis of the normal probability plot 
(Figure 30) shows that the fitted residuals are slightly skewed with extremely heavy tails. 
Note that the heavy tails could be an indication of different distributions for fitted residuals 
from different regions of the predictor variable space. Again, unlike Tong Model 5 (25) 
and the other TAR models discussed in the previous section, the autocorrelation function 
for the fitted residuals reveals no evidence of short term autocorrelation. Also, we could 
consider the residuals independent if they were normally distributed because the normalized 
cumulative spectrum of the fitted residuals falls entirely within the 90% K-S bounds from 
the cumulative spectrum for Gaussian white noise. However, as with these other models, 
the fitted residuals still display a pattern of high residual values during periods of high 
riverflow (Figure 29). This is evidence that we have still not captured all the relevant 
predictor variables for the periods of high level riverflow. 

c. Interpretation of the Two Year SMASTAR Model ICE486 

The tree-like structure of Model ICE486 (54) provides some insight into the 
complex interactions of the riverflow system. There are three major regions of interest 
that may be identified by a visual inspection of the equation for the model. They include 
riverflow when it falls below 3.98 m^/sec. (top line), along with the model terms that reflect 
the direct contributions by the lagged transformed precipitation (term 2 of line 1 and lines 
3, 4 and 5) and lagged temperature (lines 7 though 12) variables. 

Since all terms in (54) that have the term (At_i — 3.98)+ have value 0 when 
the riverflow falls below 3.98 m^/sec., it is immediately apparent that Model ICE486, in 
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Figure 29. The Vatnsdalsa riverflow data for years 1972 and 1973 versus the fitted val- 
ues (top) and residuals (bottom) for SMASTAR Model ICE486. The SMAS- 
TAR model for the riverflow at time r, Xn was a function of lagged riverflow 
Xr-i to Xt- 5 , lagged precipitation Y*^i to y/_g, i.e., the natural log trans- 
formation = /n(l -f yr-i), lagged temperature Zr-i to Zr-e, and a vari- 

able for time of year effect. The final model contains 21 parameters that in- 
cludes 13 terms with 8 thresholds (1 each on the lagged riverflow predictor vari- 
ables; Xr-iiXr- 2 iXr-A,thelagg€dprecipiiationpredictorvariables;Yr^i,Yr- 2 i and 
the lagged temperature variable, Zt- 2 and 2 on the lagged temperature variable 
Zt-i). The standard error of the fitted residuals (Tf was 1.27m^fsec, The initial 
nine values of each time series were used to initialize the model. 
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Figure 30, The normal probability plot of the fitted residuals for SMASTAR Model ICE486 of 
the Vatnsdalsa River system for the period 1972-1974. The horizontal axis shows 
the range of the fitted residuals from Model ICEM86 while the vertical axis shows the 
corresponding percentiles from the normal distribution. Analysis of this plot shows 
that the fitted residuals from Model ICE486 are slightly skewed with the extremely 
heavy tails that we might expect with this type riverflow data. Note that the heavy 
tails could be an indication of different distributions for the residuals in different 
regions of the predictor variables. 
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Figure 31. Fitted Residual Plots from SMASTAR Model 486. The autocorrelation function 
(first 20 lags) [top] and the normalized cumulative periodogram [bottom] of the fitted 
residuals from SMASTAR Model486 of the Vatnsdalsa River system for the period 
1972-1973. The autocorrelation plot with approximate 95% individual confidence 
bounds shows that no apparent autocorrelation exists in the fitted residuals. Also, 
we could consider the residuals independent if they were normally distributed because 
the normalized cumulative spectrum of the fitted residuals falls entirely within the 
90% K-S bounds from the cumulative spectrum for Gaussian white noise. 
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this case, can be reduced to 



Xr = 3.12 + 2.34(r;.i - 2.49)+. (55) 

This indicates that when Xr-i < 3.98 and in the absence of lag 1 precipitation lV-i> Ihe 
riverflow Xr will fall to a steady state level of 3.12 m^/sec.^ the model constant. Note 
that the minimum riverflow during the modeling period was 3.98 m^lsec. Once the level 
of riverflow is reduced to 3.12 m^lsec.y then a minimum = 2.86, or yV-i = 16.42 

millimeters of rainfall must occur to raise the riverflow level above 3.98 m^/sec., the level 
at which the other model terms can again ‘kick in’. Also, anytime that lag 1 transformed 
precipitation Y*^i > 2.49 units (or lag 1 precipitation Yr-i > 11.06 millimeters) there is 
an immediate contribution to the riverflow as a result of this term. 

The next region of interest for Model ICE486 (54) occurs when the lag 1 
riverflow Xr^i is greater than 3.98m^/sec. and includes the terms of the model that possess 
lagged transformed precipitation variables (lines 1, 3, 4 and 5 in the equation). These four 
terms reflect the direct influence of precipitation of the riverflow system. For example, 
note the positive coefficient for the first precipitation term involving Y*_^ (line 1) and the 
negative coefficient for the last precipitation term involving Y *_2 5). If significant 

precipitation occurs (Y*_i > 2.49) there is the immediate (first day) influence of the lag 1 
term Y*_^ (line 1) that is moderated the second day by the lag 2 precipitation term Y *_2 
(line 5), if the lag 1 riverflow (X^-i) is greater than 3.98 m^fsec.y i.e., the term on line 5 
reflects the decrease in river runoff levels 2 days after a significant rainfall. 

The last region of interest includes the last 6 terms of the model. These 
terms reflect the direct influence of temperature on the riverflow system. The terms include 
2 pairs of the lag 1 temperature variable terms Zr^i (lines 7 through 10), and 1 pair of 
lag 2 temperature variable terms Z ^-2 (lines 11 and 12). The threshold values of 2.4, 3.2 
and 3.3 provide the necessary switching mechanisms to correctly modify the changing 
behavior of the riverflow system as it is affected by temperature. We can use coefficients 
of these model terms and temperature extremes to characterize the behavior of the model 
as it is affected by temperature. For example the coefficients for the model terms that 
are active (making a nonzero contribution) during very low successive days of temperature 
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(-.014, .008, .008) (lines 7, 9 and 11) and during successive days of rapidly increasing tem- 
peratures (-.021,. 012, .008) (lines 8, 10 and 11) in effect cancel each other out. Under these 
conditions, temperature Zr-\ and Z^-2 appears to show little direct influence on riverflow. 
In contrast, during periods of very high successive temperatures (—.021, .012, -.005) (lines 
8, 10 and 12) and rapidly falling temperatures ( — .014, .008, —.005) (lines 7, 9 and 12) the 
temperature terms contribute to the model by forcing riverflow to lower levels. All of these 
results are rapidly identified and seem reasonable. 

d. Predictive Performance of SMASTAR Model ICE486 

We now investigate the predictive performance of Model ICE486, developed 
and discussed above. SMASTAR Model ICE486 (54) and the riverflow, precipitation and 
temperature data during the year 1974 were used to perform a 1 day forward-step ahead 
predictions of the Vatnsdalsa riverflow. Prediction of this riverflow for this period is a 
formidable task due the extreme shift in time and magnitude of riverflow that occurs during 
the spring along with the decrease in riverflow that occurs later in the year during 1974. 
For example the minimum riverflow during the modeling period was 3.98 m^/sec., while 
the minimum riverflow during the prediction period was 3.67 m^/sec. 

The prediction effort used two methods; the first method fixes both the 
model coefficients and model terms (fixed model) as was done in Chapter II using ASTAR 
Model 9 of the Wolf sunspot numbers to perform forward-step ahead predictions. The 
second method fixes the model terms and permits daily updating of the model coefficients 
(coefficient update) using the latest 731 data values of the riverflow system. For example the 
1-step ahead prediction of Model ICE486 (54) at each value of r during 1974 using coefficient 
update is obtained by first updating the model coefficients using the data Ar-t, Fr-t and 
Zr-t for i = 1,...,731 and then making the 1-step ahead prediction. Updating the model 
coefficients is just a simple linear regression step because the threshold values of each model 
term are fixed. This second method, coefficient update, W2ts implemented to determine what 
impact changes in riverflow structure during 1974 has on the fixed prediction model and 
also because of the nonlinear behavior of the system. 

Figures 32-34 contain plots of the actual riverflow versus 1-step ahead pre- 
dictions and the fitted residuals for the Vatnsdalsa riverflow during the year 1974. In both 
cases the model predictions react very well to both the extreme spring transition and low 
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riverflow that occurs later in the year. However, as expected the 1-step predictions using 
coefficient updating prediction (Figure 32) is an improvement over the 1-step fixed model 
predictions (Figures 33). The standard error of the fitted residuals are is 2.11 m^fsec. 
and is 2.36 m^lsec. respectively. Figure 34 gives the estimated normalized periodogram 
of the fitted residuals from the 1-step ahead predictions of Model ICE486 using the ‘coeffi- 
cient update’ prediction model. The cumulative normalized spectrum of the fitted residuals 
falls outside the 90% K-S bounds for Gaussian white noise thus indicating that the fitted 
prediction residuals are not Gaussian white noise. 



Vatnsdalsa River Data (1974) 
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Figure 32. The actual riverflow versus l-step ahead predictions [top] and errors [bottom] from 
MODEL ICE486 for the Vatnsdalsa riverflow data (1974) with coefficient updating 
(coefficient update). The standard error of the fitted residuals is 2.11 m^/sec. 
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Figure 33. The actual riverflow versus 1-step ahead predictions [top] and errors [bottom] from 
MODEL ICE486 for the Vatnsdalsa riverflow data (1974) without coeflScient up- 
dating (fixed model). The standard error of the fitted residuals is 2.36 vr? jstc. 
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Figure 34, The estimated normalized periodogram of the fitted residuals of SMASTAR Model 
ICE486 from the Vatnsdalsa riverflow data for 1974 using the ‘coefficient update’ 
prediction model. The cumulative normalized spectrum of the fitted residuals falls 
outside the 90% K-S bounds for Gaussian white noise indicating that we should reject 
the hypothesis that the fitted residuals from this prediction effort are Gaussian white 
noise. 
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C. SUMMARY 



This chapter extended the ASTAR modeling methodology developed in Chapter II to 
semi-multivariate ASTAR (SMASTAR) modeling methodology, which appears well suited 
for taking into account the complex nonlinear interactions among multivariate, 
cross-correlated, lagged predictor variables of a time series system. Using the Vatnsdalsa 
riverflow system as an example, Tong et al. (1985) showed that normal autoregressive mod- 
els were incapable of capturing the complexities of the cross-correlated predictor variables of 
this type time series system. Also, the methodology for and structure of semi-multivariate 
TAR appears incapable of capturing these complexities with a parsimonious model. How- 
ever, the MARS methodology in the form of a SMASTAR model appears to better consider 
the complex relationships between the cross-correlated predictor variables and seems capa- 
ble of providing semi-multivariate nonlinear time series models for prediction. Moreover, 
the MARS methodology, although computer intensive, provides a systematic approach to 
modeling time series systems. 

It is important to note that the lagged riverflow, precipitation and temperature may 
only provide rudimentary insights into riverflow modeling and prediction and may not 
be sufficient for developing a model of this semi-multivariate time series system. Other 
predictor variables such as wind conditions in the case of the Vatnsdalsa riverflow system 
may provide important information for modeling of the riverflow system. As with any 
regression or time series modeling effort, one can never be sure that one has all the relevant 
predictor variables. However, this additional complexity can be handled in MARS 3.0 with 
the modifications that will be discussed in Chapter IV. 

Other data sets, such as the Canadian Lynx data, and the Sea Surface Temperature 
data that wiU be discussed in chapter IV, and many other riverflow data sets exhibit ‘peri- 
odic’ behavior and it would be of interest to model them with the SMASTAR procedure. 
Of special interest are those data sets with a fixed cycle oscillation that dominates the data. 
The length of the Vatnsdalsa riverflow data modeled in this chapter may not have been a 
long enough to satisfactorily establish the fixed yearly oscillation that appears to exist in 
the SMASTAR models. 



99 



IV. MODELING OF TIME SERIES SYSTEMS USING MARS 3.0 



The univariate and senu-multivariate ASTAR models developed in Chapters II and 
in are the result of applying the alpha test version of the MARS 2.0 program (released 
in December 1989) to the Sunspot numbers and Vatnsdalsa riverflow data sets. Friedman 
released the alpha test version of the MARS 3.0 program in December 1990. The MARS 
3.0 program is a collection of subroutines that implement the multivariate adaptive regres- 
sion spline strategy developed in Chapter II. Changes in the MARS 3.0 program include 
plotting subroutines that are useful for interpreting a MARS model, and logistic regression 
subroutines for modeling categorical variables. Note that these subroutines are of interest 
but have not been fuUy investigated for application in a time series setting. The subroutines 
for use in time series analysis were largely unaffected in the update from the MARS 2.0 
program to the MARS 3.0 program. 

Our use of the MARS 2.0 program for univariate and multivariate time series modeling 
and analysis was largely time series specific. For example, our time series modification 
of the MARS 2.0 program permitted only 20 lagged predictor variables and there was 
always a residual question as to whether the model would, in some sense, converge if the 
modeling effort was ‘opened up’, i.e., if more lagged predictor variables were permitted. 
Thus, given the results of the ASTAR and SM ASTAR time series models developed in 
Chapters II and III, it was of interest to develop the capabilities of the MARS 3.0 program 
so that it could be used for the general modeling and analysis of any time series system. 
In particular, the current MARS 3.0 time series program include; simplified input for the 
program parameters and different input time series, automatic development of the regression 
matrix for up to three input time series for any combination of lagged predictor variables, 
automatic computation of memory requirements necessary for the array space calculations 
used during execution of the MARS 3.0 program, and model output that facilitates analysis 
of the ASTAR or SMASTAR time series model. In addition, a major change is the inclusion 
of model selection criteria other than GCV* (discussed in Chapter V), the ori^nal model 
selection criterion in the MARS 3.0 program. Note that the Fortran Programs presented in 
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the appendices are for use with NDP Fortran 2.1.4 under DOS with Microway NDP Fortran 
using the Microway Weitek Coprocessor. This Fortran uses the PharLap DOS Extender to 
enable Fortran to use all available RAM. However, almost identical programs are available 
for IBM mainframe computers running VS Fortran. 

This chapter is divided into two sections. Section A of this chapter discusses the 
Fortran programs developed for time series modeling and analysis using the MARS 3.0 
program. Section B of this chapter briefly reports on the modeling and analysis of the 
Granite Canyon sea-surfaee temperatures using the MARS 3.0 program and the Fortran 
programs discussed in Section A of this chapter. The sea-surface temperatures are a very 
long and complex data set with interesting phenomena on many time scales. Thus it is 
interesting to see how the MARS methodology handles this time series. 

A. NEW FORTRAN SUBROUTINES FOR MODELING TIME SERIES 

SYSTEMS USING MARS 3.0 

As presently constituted, the MARS 3.0 program is not simple to use for time series 
modeling and analysis. The MARS 3.0 program requires that various program parameters 
be set, does not manage memory requirements for different modeling projects and requires 
a complete regression design matrix ais program input. To overcome these difficulties For- 
tran programs were developed for time series modeling and analysis using the MARS 3.0 
program. Appendices A thru C are Fortran programs to prepare and execute the MARS 3.0 
program for the nonlinear modeling and analysis of time series systems. A BATCH program 
(appendix A) provides useful user information and sequentially executes the MARSBLD 
(appendix B) and MARSDRV (appendix C) Fortran programs. The BATCH program first 
calls MARSBLD (appendix B), which asks for the names of up to 3 input time series files 
and then prepares the regression design matrix and program parameters for input into the 
MARS 3.0 program. Each time series is located in a separate file with leading lines that con- 
tain the model parameters (including the lagged predictor variables) necessary for running 
the MARS 3.0 program. Next, the BATCH program calls MARSDRV (appendix C), which 
first computes the memory requirements needed in the MARS 3.0 program and then, if the 
memory allocation is sufficient, initiates the MARS algorithm described in Chapter II. The 
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only value that may need adjustment is the pairameter for MARS 3.0’s memory allocation 
that is located on the 2nd line of MARSDRV. 

The first 3 records of each input time series contain the model parcimeters necessary 
for initiating the MARS 3.0 program. Note that parameters common to the entire program 
tire mtirked with an asterisk (*) and are actually taken (read) from the first input time 
series. The model parameters are; 

1. JV — The total length of the time series system to be investigated including the 
initialization values, i.e., only N — d* values will be modeled where d* is the maximum 
lagged predictor variable across all input time series. Each input time series must be 
of at least length N. 

2. P — The total number of predictor variables from the input time series. For example, 
a time series that is modeled with lags 1, 2 and 10 uses three predictor variables. 

3. MP — The maximum level (upper bound) of interactions permitted in the generated 
ASTAR or SMASTAR model. In general, this pcirameter should be set to MI < 3. 
Models permitted to form higher level interactions are difficult to analyze and have a 
tendency to become unstable. 

4. NK* — The maximum number of steps in the forward-step MARS algorithm. The 
forward-step algorithm is followed by a backward-step algorithm that trims excess 
terms from the model. 

5. MS* — The minimum span (in the form of the number of data points) between 
adjacent thresholds on a lagged predictor variable. This model parameter can be 
thought of as a smoothing parameter similar to the bandwidth in kernel smoothing. 
A large value of MS permits fewer threshold values on a given predictor variable. 

6. DF* — The degrees of freedom charged for the selection of a predictor variable, 
threshold value and coefficient for inclusion in a MARS model. In general values of 
2 < DF < 4 are recommended with a value of DF = 3 used most frequently. 

7. MSC* — The model selection criterion for use within MARS 3.0. The alternatives 
(discussed in Chapter V) include Friedman’s GCV*, Akaike’s AIC, Schwarz and 
Rissanen’s SC, and Amemiya’s PC. 

8. LX — Flag for each lagged predictor variable of each time series. 

(a) 0 - Directs the predictor variable be excluded from the model. 

(b) 1 - The predictor variable has no restriction. The predictor variable can enter 
the model with or without a threshold value and also can enter the model as an 
interaction with other predictor variables. 

(c) 2 - An additive predictor variable. The predictor variable can enter the model 
with or without a threshold value. However, the predictor variable is not per- 
mitted to enter the model as an interaction with other predictor variables. 
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(d) 3 - A linear predictor variable. The predictor variable can enter the model only 
as a linear variable, i.e., without an internal threshold. It is not permitted to 
enter the model as an interaction with other predictor variables. 

(e) -1 - A categorical predictor variable with no restriction. The categorical predictor 
variable can enter the model ais an interaction with other predictor variables. 

(f) -2 - An additive categorical predictor variable. The predictor variable can enter 
the model but is not permitted to interact with other predictor variables. 

9. LAGS — The actual identification of the lagged predictor variables for each time 
series. For example, LAGS =1,2 and 5 means the 1st, 2nd and 5th lagged predictor 
variables. The lagged values must be ordered from smallest to largest. The maximum 
value of LAGS across all input time series wiU dictate the number of values used for 
model initialization, d*. 

The sample output (appendix D) of an ASTAR or SMASTAR model that results 
from the execution of the MARS 3.0 program includes; a summary of the model parameters 
(discussed above), the forward and backward steps of the MARS algorithm, the final MARS 
model matrix, the relative benefit of each lagged predictor variable included in the final 
model and the final model output in a form that permits some model analysis. 

B. GRANITE CANYON SEA-SURFACE TEMPERATURES 

The Granite Canyon time series is a large data set of the daily raw sea-surface temper- 
atures taken at Granite Canyon, a point just north of Big Sur along the coast of California. 
Using MARS 3.0 and the Fortran programs discussed in Section A, three ASTAR time 
series models of this data were developed for test purposes and to compare with results of 
a previous modeling effort by Breaker and Lewis (1985). The next two parts of this section 
are a brief background discussion of the Granite Canyon sea-surface temperatures and the 
modeling effort taken from Breaker and Lewis (1985). The last part of this section is a dis- 
cussion of three ASTAR time series models of the Granite Canyon sea-surface temperatures 
developed using the MARS 3.0 program and the Fortran programs discussed in Section A. 

1. Sea-Surface Temperatures 

Sea-Surface temperatures (SSTs) and their changes in time and space (ocean 
depth, longitude and latitude) contribute to our understanding of complex ecological issues 
such a5 the dispersal of poUutants and fisheries biology. Investigations along the U.S. 
Pacific Coast indicate that coastal SSTs can be useful indicators of ocean temperature 
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variability, representative of phenomena occurring over wide regions, related to other ocean 
and atmospheric variables and have consistent internal structure. Two major factors that 
contribute to the seasonal variability of SSTs along the California coast include coastal 
upwelling and the coastal countercurrent. Many investigations of the SSTs appear to focus 
on temperature anomalies that can persist for several months and influence wide areas of 
the coast. 

Along the California coast, SSTs are collected at approximately 25 locations. The 
co 2 istal observations often extend over many years and thus provide a unique opportunity to 
examine coastal variability over relatively long periods. For locations where the measuring 
site has a good exposure to the adjacent continental shelf and slope, measurements of 
SSTs may be particularly revealing with respect to some of the physical processes that 
occur regionally as well as locally. Additional SST background material and references is 
available in Breaker and Lewis (1988). 

2. Spectral Decomposition of the Granite Cemyon Sea-Suriace Temperatures 

The purpose of Breaker and Lewis (1985) was to model the behavior of the 12 
year Granite Canyon data set (Figure 35); to use the model <md other statistical techniques 
to project or predict the data to future time and to provide a descriptive interpretation of 
the Granite Canyon data from the oceanographic viewpoint. The model considered was 

Yr = Mr + Sr + (56) 

where Mr is a linear trend, Sr consists of seasonal and cyclic changes, and Cr is a mean 
zero, constant variance, stationary random sequence that describes irregular fluctuations 
and is independent of the other model components. 

The modeling procedure initially used least squares regression to identify the 
linear component Mr = 10.9 + .000374r. There is no doubt that there is an evolutionary 
trend in the data, probably part of a long term cycle. Unless removed, it corrupts the 
periodogram with large values at very low frequencies. 

Next, after detrending the data with the linear component Mr, the components Sr 
and (r were identified using a complex iterative method composed of spectral decomposition 
and autoregressive time series modeling. The resiilting model’s long term cyclical and 
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TIME IN DATS 



Figure 35. The record of 12 years of daily raw sea-surface temperatures at Granite Canyon from 
1 March 1971 to 1 March 1983 taken at approximately 0800 hours each morning. The 
range of values for the daily sea-surface temperatures range from 8.0 to 17.0 ®C. with 
a mean value of 11.7 ®C. A least-squares fit of a linear trend to the data indicates 
that the ‘average’ temperature rose from about 10.9 to 12.5 during the 4380 days 
of the data set. This change in overall temperature level is evident to fishery industry 
and others as a gross change in the animals and flora seen in the area during this 
time. Note that the El Nino phenomenon is clearly evident in the record, particularly 
in 1979 (about day 3200). 
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seasonal changes component, 5 t, used 13 terms that correspond to 46.6, 182.5, 243.3, 365, 
398, 486.6, 547.5, 625.7, 730, 876, 1095, 1460, and 2190 days. There is no physical basis 
for some of these fixed cycle components; they represent an unsatisfactory edifice for model 
building. However, there is a definite long term (3 to 5 year) effect of the El Nino, which 
one would clearly like to model and predict. The error term, Cr, was modeled as an AR(2) 
process with a standard deviation of .537 although a hump in the correlogram at a lag of 
about 14 days was unaccounted for. The subsequent analysis of the fitted values and fitted 
residuals of the model indicated that the model was reasonably adequate and accurate. One 
and two step predictions (Breaker and Lewis, 1985) of the 30 days immediately following 
the modeling period (1-30 March 1983) resulted in predictive MSE’s of .40 and .57 
respectively. 

3* ASTAR Models of the Granite Canyon Sea*Surface Temperatures 

Three ASTAR models of the Granite Canyon sea-surface temperatures were de- 
veloped using MARS 3.0 and the Fortran programs discussed in Section A. The first model 
(Granitel) used lags 1 to 49 and lag 365 of the sea-surface temperature series as the model 
predictor variables; the second model (Granite2) used lags 1 to 50 of the sea-surface tem- 
perature series and a discrete valued cosine and sine curve with a period of 1 year as the 
model predictor variables (the cosine and sine predictor variables were restricted as linear 
terms i.e., these two predictor variables were not permitted to interact with other predic- 
tor variables and were not permitted to form threshold terms); the third model (Granite3) 
used lags 1 to 50 of the sea-surface temperature series as the model predictor variables. The 
model parameters were: MI = 3, the maximum level of interaction in the ASTAR model; 
MS = 50, the minimum span between threshold values on a predictor variables; NK = 60, 
the number of forward steps in the MARS algorithm with N = 4380 days of sea-surface 
temperatures. 

The three ASTAR time series models are similar. Appendix D is the output of 
the second ASTAR model (Granite2) with lags 1 to 50 and a discrete valued cosine and sine 
curve with a period of 1 year as the input predictor variables. The model contains 45 terms 
(a model constant, 5 one-way, 10 two-way and 29 three-way interactions) and 27 threshold 
values (one on lags 5, 7, 15, 17, 19, 20, 25, 29, 30, 31, 36, 39, 44, 45, and 47; two on lags 2 
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and 3; three on lag 35 and five on lag 1). The lag 14 and 26 predictor variables enter the 
model as linear terms, i.e., without interior threshold values. This is interesting because 
the effect is clearly seen in the correlogram of the detrended data. 

Using the relative loss of model fit due to the removal of each term from the 
model, the most important terms in the model are the lag 1, 2, 14, 36, cosine, 3 and 35 
followed by the other terms of the model. The appearance of the lag 26 predictor variable 
is interesting; it corresponds to the effect, whose origin is as yet unknown, reported on in 
Breaker and Lewis (1988). ASTAR Model Granite2 is 

15.78 -0.103 cos(r/365) 

+1.124(Xr_i - 15.4)+ - 1.042(15.4 - Xr_i)+ 

-0.075(Xr-2 - 8.00)+ + 0.051(Xr-2o - 9.50)+ 
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+0.015(Xt_ 2 - 8.00)+(A'r_3 - 13.6 )+(A't-4s - 13.0)+ 

-0.024(X^_2 - 8.00)+(12.8 - A'r-s)+(A'r-36 - 12.4)+ 

-0.006(Xr-2 - 8.00)+(11.8 - Xr_7)+(13.4 - Xr-n)+ 

-0.008(Xt- 2 - 8.00)+(Xr-7 - 11.8)+(13.4 - Xt_i7)+ 

-0.031(Xr-2 - 14.8)+(Xr_i4 - 8.00)+(X^_26 - 8.00)+ 

+0.051(Xt_2 - 14.8)+(Xr-14 - 8.00)+(Xr-36 - 8.00)+ 

+0.044(14.8 - Xr-2)+(Xr_14 ~ 8.00)+(X^_39 - 15.0)+ 

+0.013(Xr-2 - 8.00)+(Xr_i7 - 13.4)+(13.4 - X^_3i)+ 

+0.015(Xr-2 - 8.00)+(Xr-i7 - 12A)+(Xr-3i - 13.4)+ 

-0.042(Xr-2 - 8.00)+(Xr-17 - 13.4)+(Xr-44 ~ 14.9)+ 

-0.027(Xr-2 - 8.00)+(Xr_19 - 9.10)+(10.0 - Xr-3s)+ 

-0.005(Xt-2 - 8.00)+(Xr_19 - 9.10)+(Xr-3S - 10.0)+ 

-0.016(Xr-2 - 8.00)+(10.2 - Xr-30)+(12.4 - Xr-36)+ 

-0.006(Xt-2 - 8.00)+(Xr-3O - 10.2)+(12.4 - Xr_36)+ 

-0.029(Xt-2 - 8.00)+(Xr_36 - 12.4)+(11.8 - Xr-47)+ 

-0.019(14.8 - Xr-3)+(A'r-14 ~ 8.00)+(10.1 - Xr-29)+ 

-0.003(14.8 - Xr_3)+(Xr_14 ~ 8.00)+(A'r_29 ~ 10.1)+ 

+0.017(Xr_i4 - 8.00)+(9.5 - Xr_2s)+(15.4 - Xr-3s)+ 

The results from the three ASTAR models and the spectral decomposition model 
(56) appear similar. The standard error of the fitted residuals of ASTAR Model Granite2 is 
at = .516®C, versus .537®C for the spectral decomposition model (56). Both models identify 
the yearly component as an important term along with the Importance of lag terms between 
lag 40 and lag 50. Figures 36 - 39 are plots for the analysis of the fitted residuals of the three 
ASTAR models. Figure 36 shows the fitted residuals from 1 March 1979 to 28 February 1980 
for the three ASTAR models of the Granite Canyon sea-surface temperatures. No pattern 
appears to exist. Figures 37 - 39 show the histogram, normalized cumulative periodogram 
and residual probability plots of the fitted residuals from the three ASTAR models for 
the Granite Canyon sea-surface temperatures. The residuals from the histogram plots are 
slightly positively skewed. Figure 38 shows that we can consider the residuals independent 
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if they are normally distributed because the normalized cumulative spectrum of the fitted 
residuals falls entirely within the 90% K-S bounds from the cumulative spectrum for white 
noise. However, the residual probability plots in Figure 39 show that the fitted residuals are 
slightly skewed with heavy tails, thus indicating the nonnormality of the fitted residuals. 
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Figure 36. The fitted residuals from 1 March 1979 to 28 February 1980 for three ASTAR time 
series models of 12 years of daily sea-surface temperatures taken at Granite Canyon. 
The fitted residuals from each model show no obvious pattern. The fitted residuals 
from other years are similar. 
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RESIDUAL ANALYSIS for 3 Granite Models 
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Figure 37. The histogram of the fitted residuals from 12 years of data ( March 1971 to 1 March 
1980) for three ASTAR time series models of the Granite Canyon sea-surface tem- 
peratures. The histograms are overfitted with a normal curve. The fitted residuals 
from each model appear slightly positively skewed. 
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Figure 38. The normalized cumulative periodogram for the fitted residuals from 12 years of data 
(1 March 1971 to 1 March 1980) for three ASTAR time series models of the Granite 
Canyon sea-surface temperatures. We can consider the residuals independent if they 
were normally distributed because the normalized cumulative spectrum of the fitted 
residuals falls entirely within the 90% K-S bounds from the cumulative spectrum for 
white noise. 
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Figure 39. Normal Probability plots of the fitted residuals from 12 years of data (1 March 1971 
to 1 March 1980) for three ASTAR time series models of the Granite Canyon sea- 
surface temperatures. Note that in all three cases the fitted residuals are slightly 
skewed with heavy tails, thus indicating the nonnormality of the fitted residuals. 
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C, SUMMARY 



The straightforward application of the MARS 3.0 program for the modeling and anal- 
ysis of time series systems is not simple. This chapter discussed modifications to the MARS 
3.0 program and additional Fortran programs that were developed to permit the general 
modeling and analysis of time series systems. Specific changes include; the simplification of 
program parameter input along with the input of different time series, automatic develop- 
ment of the regression matrix for up to three input time series for any combination of lagged 
predictor variables, automatic computation of memory reqmrements necessary for the array 
space calculations used during execution of the MARS 3.0 program, and model output that 
facilitates analysis of the ASTAR or SMASTAR time series model. Also, a major change 
is the inclusion of model selection criteria other than GCV* (discussed in Chapter V, the 
original model selection criterion in the MARS 3.0 program. 

The MARS methodology in conjunction with these time series modifications represents 
a new computer intensive but systematic (automatic) modeling approach that isolates the 
low-dimensional structure among the lagged predictor variables, simplifies the modeling 
effort and, as shown in Chapters II and HI, provides an interpretable representation of 
a nonlinear time series model that can be used to analyze the relationships between the 
dependent (output) variable and the independent (explanatory) variables of nonlinear time 
series systems. The ASTAR time series models for the SSTs generated in this chapter took 
less than 20 minutes of CPU time on an IBM 3030 mainframe computer using VS Fortran. 

Other modeling efforts of sea-surface temperatures have been limited due to the size 
and complexity of the sea-surface temperature time series system. These limitations appear 
to be overcome by ASTAR and SMASTAR time series models. What would be of greater 
interest than the univariate analysis of sea-surface temperatures is an investigation using 
lagged and cross-correlated sea-surface temperatures, surface winds and time as predictor 
variables. The application of the MARS algorithm to time series to produce SMASTAR 
models appears to provide this opportunity. 
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V. MODEL SELECTION FOR NONLINEAR TIMES SERIES 
MODELING USING MULTIVARIATE ADAPTIVE SPLINE 

REGRESSION (MARS) 



A. INTRODUCTION 

One difficulty that is often faced during the selection of the regression model is the 
problem of choosing the appropriate predictor (explanatory) variables and model dimension, 
i.e., which of the given predictor variables to include in the final model, either for the 
purpose of prediction or for the purpose of description. This chapter examines the problem 
of model dimension and variable selection when using adaptive regression splines to develop 
a nonlinear autoregressive model for a univariate or semi-multivariate time series system. 

The current MARS algorithm, formulated by Friedman (1991) and implemented in the 
MARS 3.0 program, uses a form of residuaJ-squared-error as a model selection criterion, in 
part because of its attractive computational properties. The actual model selection criterion 
that is used in the forward and backward steps of the MARS algorithm is a modified form 
of the generalized cross validation criterion (GCV) first proposed by Craven and Wahba 
(1979). However, one question that immediately develops is whether the modified GCV 
criterion is the ‘best’ criterion for model selection when using serially and cross- correlated 
time series data. Other model selection criteria, such as Akaike’s Information Criterion 
(A7C) (Akaike, 1974), have been suggested for model development in a standard linear 
autoregressive time series setting. 

Section B of this chapter is a brief discussion of five modeling criteria selected for 
evaluation and comparison within the MARS methodology. The five criteria include GCV* 
(Friedman, 1991), Akaike’s Information criterion {AIC) (Akaike, 1974) and modified AIC 
{AIC2) (Akaike, 1979), Amemiya’s criterion (PC) (Amemiya, 1980) and Schwarz’s crite- 
rion {SC) (Schwarz, 1978; Rissanen, 1978). Section C of this chapter examines the ability of 
the different criteria to correctly identify simple linear and nonlinear models and efficiently 
estimate the model coefficients. However, an approximation to the relationship between 
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the response variable in terms of the explanatory variables of the time series system may 
be more important than exact model specification and the exact identification of relation- 
ships between the response variable and the different predictor variables. In this regard, 
Section D of this chapter examines the ability of the different model selection criteria to 
estimate the fitted values and the limit cycle from ASTAR Model 9 (38) of the Wolf Sunspot 
numbers. ASTAR Model 9 resulted from the investigation of the ability of MARS to model 
an actual time series in a more difficult setting. Section E of this chapter is a discussion 
of the application of the model selection criteria to the Vatnsdalsa riverflow data, where 
in an ‘unrestricted’ modeling environment the SC model selection criterion resulted in a 
better riverflow model than Model ICE486 developed in Chapter III using the GCV* model 
selection criterion. 

B. MODEL SELECTION CRITERIA 

Much of the literature concerned with estimation and inference of a sample time series 
makes the assumption that we are able to correctly specify the model dimension. However, 
this situation may be the exception. It is more likely the case that important explanatory 
variables are omitted or extraneous explanatory variables are permitted in the model. As 
observed by Akaike (1974), the problem of model selection and fitting in the time series 
setting is best summarized as a “multiple decision criterion”. In this regard, numerous 
attempts have been made to develop model selection rules and to provide some framework 
for their use. This section introduces the model selection and fitting problem (Judge et al., 
1985) and then briefly discusses the current model selection criterion in MARS, GCV* (23), 
and four proposed model selection criteria from linear autoregressive time series modeling for 
use within MARS; Amemiya’s Prediction Criterion (PC) (Amemiya, 1980) and three other 
‘information theory’ based criteria suggested for model selection in a time series setting. 
The three information theory criteria include Akaike’s Information Criterion (AIC) (Akaike, 
1974), Schwarz Criterion (SC) (Schwarz, 1978; Rissanen, 1978) and Modified Information 
Criterion (AIC2) (Akaike, 1979). Note that the development and application of the A/C, 
SC, PC and A/C2 criteria are based on the investigation of linear autoregressive and 
moving average (ARMA) processes. Here, our investigation focuses on the application the 
AIC, SC, PC and AIC2 criteria to non-linear time series processes. 
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1. Model Selection 

A critical aspect in determining the form of the non-parametric regression model 
during each step of the MARS strategy is the model selection criterion that is used to 
evaluate model fit and determine the ‘proper’ model dimension. At each forward step in 
the MARS algorithm, the model selection criterion is used to select the candidate term 
that most improves the overall ‘goodness-of-fit’ for addition to the model. As discussed 
in Friedman (1991), it follows that at the end of the forward-step procedure there may be 
model terms that no longer sufficiently contribute to the model fit. Thus at each backward 
step of the MARS algorithm, the model selection criterion is used to choose a candidate term 
that least degrades the overall ‘goodness-of-fit’ for deletion from the model (see Friedman 
(1991)) for a discussion of the stopping rules for the forward and backward steps of the 
MARS algorithm). 

Without loss of generality assume that MARS is in the backward stepwise proce- 
dure, i.e. trimming excess terms from the time series model. As in ( 1 ) assume there are N 
samples of T and X, namely {Yr,Xr)^-i . Using Judge et al., (1985) we can discuss the 
problem of model selection at a given step in the MARS procedure using the parameterized 
linear statistical model, 

Y = X/3 -I- e = -I- X 2 P 2 -f e, (57) 

where Y is the W-dimensional response vector for the model, X = [A'i,A' 2 ] is the current 
{N X k) design matrix with X\ and X 2 of dimension {N x k{) and {N x fcj) respectively, 
and e is an iV-dimensional error vector that htis mean zero with variance <Tg. Also, 0 is 
a l:-dimensional vector of unknown parameters that is likewise partitioned into parameter 
vectors 0i and 02 of dimension ki and k 2 respectively. The least squares estimators of 0 
and are b = ( 61 , 62 ]^ and respectively. 

If the model (57) is correct, i.e., the proper dimension of the model is in fact k, 
then the lectst squares estimators b and are minimum variance unbiased estimators 
of 0 and Now assume that the matrix X 2 contains the model terms proposed for 
possible elimination during the backwards step of the MARS algorithm. The question of 
interest is whether or not to trim ‘excess’ terms from the model, i.e., whether or not to set 
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^2 = 0. Eliminating necessary terms from the model (setting /?2 = 0) results in the least 
squares estimators b and and the estimates of y being biased while failing to eliminate 
unnecessary terms from the model (setting /Jj ^ 0), results in the least squares estimator b 
and the estimates of y having increased variance (roughness) (Rao, 1971). Thus the question 
of whether or not to set /J 2 = 0 leads an implicit or explicit determination of the tradeoff 
between the conflicting objectives of bias and variance. One approach for comparing this 
trade-off and determining whether or not to set /?2 = 0 is overall Mean Square Error (MSE). 

MSE has been used as the basis of development for many model selection criterion 
such as Mallows’ Cp (Mallows, 1973) and the GCV* (Friedman, 1988) and PC (Amemiya, 
1980) model selection criteria investigated in this chapter. The form of these model selection 
criterion and the others investigated in this chapter may be divided into two distinct parts, 
one part that considers lack-of-fit between the proposed model and data (most frequently a 
function of the residual sum of squares) and the other part that considers model complexity 
(usually a function of the number of independent parameters in the model). Adding addi- 
tional terms to a regression model permits a decrease in the model’s lack-of-flt that incurs 
a corresponding increase in model complexity. The model that minimizes a given model 
selection criterion across all investigated models is selected as the ‘best’ regression model. 
Note that all of the model selection criterion investigated in this chapter are some form 
of a modification of the GCV* criterion, the current model selection criterion in MARS, 
and are easily (though tediously) incorporated into the MARS program. This modification 
of the implementation of the MARS 3.0 program was discussed in Chapter IV. Another 
model selection criterion for which major modifications would be required of MARS and 
was therefore not considered is Parzen’s CAT criterion (1974). 

2. Modified Generalized Cross Validation {GCV*) 

The model selection criterion that is currently used for model selection in MARS 
is a modified form of the generalized cross validation criterion (GCV) first proposed by 
Craven and Wahba (1979). GCV was developed as an extension of the cross validation 
(CV) criterion pioneered by Stone (1977). Both Craven and Wahba (1979) and Friedman 
(1991) provide discussion and references for the development and use of the GCV criterion. 
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If we let the residual sum of squares between the data and the fitted model be 

^ ( 58 ) 

i=l 

then the modified generalized cross validation criterion {GCV*) used in a MARS model 
with subregions {Rj}^i is, 

GCV(M) = (I) • (59) 

Again as discussed in Chapter II the difference between GCV* and GCV is in the computa- 
tion of C{M)*y a model complexity penalty function that is increasing in M, the number of 
nonconstant basis functions in the MARS model (Friedman, 1991). C{M)* is representative 
of the number of independent model parameters in a MARS model with M subregions, and 
accounts for the heavy use of the data in determining both the predictor variables and the 
predictor variable partition points in addition to the usual model coefficients. Typically the 
residual sum of squares decreases as the model becomes more and more complex, but the 
second term increases so that at some point a minimum is reached. 

Friedman (1991) provides valuable insights into the use of the GCV* criterion 
for various types of MARS modeling. However, the setting that Friedman proposes for the 
use of the GCV* criterion does not assume serial correlation among the predictor variables. 
Thus there is a question whether the GCV* criterion is the ‘^best’’ criterion within MARS 
for the development of ASTAR and SMASTAR models using serial correlated and cross- 
correlated predictor variables. 

3. Model Selection using Information Theory 

Many of the popular model selection criterion that are used in a linear times series 
setting are based on information theory. Most are an outgrowth of the development of the 
AIC criterion, which is based on the KuUback-Leibler Information Criterion (Akaike, 1974). 
The objective of a model selection criterion that is based on information theory is to select 
a modd that ‘best’ incorporates the conflicting considerations of precision of the model 
estimates (again a measure of the remaining lack-of-fit of the model) and model parsimony 
(usually a measure of model complexity). 
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a. Alcaike’s Information Criterion (AIC) 

The use of AIC as a model selection criterion is popular because of its 
simplicity. However, there are some indications that the AIC Criterion, in the context of 
linear autoregressive time series modeling, overestimates the number of model parameters, 
thus favoring a decrease in model lack-of-fit with respect to model complexity i.e., the AIC 
criterion develops an over-parameterized model. The AIC criterion for a MARS model with 
subregions 

Note that 



ln(GCV(M)) =AIC{M)+2 [in (l + n%%. ) - (^)) 



Using the first three terms of a Taylor series expansion to approximate 

, + _£W_ 

N- C(M)- 



gives. 



tn(GCV(M)) = AIC(M) + 2 (/n(l) + 



— AIC(M) + 2 ( 

- -I- ^ {N{N-C{My) ) 

= AIC(M) + o{jj), 



so that the AIC and GCV" criteria are closely related, especially when N, the sample size, 
is large. 

b. Schwarz Criterion (SC) 

In response to indications that the AIC criterion over-parameterizes the 
model, Schwarz (1978) developed a model selection criterion using a Bayesian argument. 
At the same time, Rissanen developed (see Rissanen, 1987) a model selection criterion using 
stochastic complexity analysis to evaluate the uncertainty in the data. When applied to 
linear time series modeling Risssanen’s criterion is equivalent to the Schwarz criterion. Note 
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that Rissanen (1987) makes a strong case for the use of this criterion because of its apparent 
widespread applicability. In comparison to the AIC criterion (60) the Schwarz-Rissanen 
{SC) criterion increases the penalty for adding additional terms to the model by a factor 
of (1/2) ln{N). The SC criterion for a MARS model with subregions is, 



5C(M) = in 



In {N) C(M )• 
N 



(61) 



c. Akaike's Bayesian Information Criterion (AIC2) 

Akaike (1979) also used a Bayesian framework to develop a criterion for 
selecting a more parsimonious linear time series model than the AIC criterion (60), i.e. 
a criterion that like the SC criterion (61) increases the importance of model complexity 
with respect to the model lack-of-fit within the regression model. The AIC2 criterion for a 
MARS model with subregions is. 



AIC2{M) = {N-C{M)*) In 






N - C{M) 






+ C{MY In 




.(62) 



4. Amemiya^s Prediction Criterion {PC) 

To consider the cost associated with selecting an incorrect model, Amemiya (1980) 
developed a model selection criterion based on minimizing the unconditional mean squared 
prediction error. This results in a modification to the AIC criterion (60) that corrects for 
increasing complexity due to adding additional terms to the MARS model. Again, as with 
the SC (61) and AIC2 (62) criteria, Amemiya’s PC criterion imposes a heavier penalty than 
the AIC criterion (60) for adding additional terms to a model. Amemiya’s PC criterion 
for a MARS model with subregions is. 

Note that 



GCV(M) =PCiM)+{jS^y 
= PC(M) + o{ji), 



120 



so that the PC and GCV* criteria are closely related, especially when N , the sample size, 
is large. 

C. SOME SIMPLE SIMULATIONS TO COMPARE MODEL SELECTION 
CRITERIA 

In Chapter II simulations were used to show the ability of MARS to identify and es- 
timate the coefficients of simple linear and nonlinear time series models. In this section 
simulations are now used to initially examine the relative ability of the model selection 
criteria discussed in Section B, to identify and estimate the coefficients of simple linear and 
nonlinear time series models. Again, the simulation of an AR(1) model with known coef- 
ficients examines the relative ability of each model selection criterion to detect and model 
a simple linear time series within the framework of the MARS methodology. The simula- 
tion of a threshold model with ‘AR(1) - like’ models in each disjoint subregion examines 
the relative ability of each model selection criterion to detect and model simple nonlinear 
threshold time series within the framework of the MARS methodology. As in Chapter II 
the interest in these simulations is two-fold; how often wa5 the true model identified by 
each model selection criterion and if not, did the model selection criterion overestimate or 
underestimate the number of model parameters. Secondly, if the true model was identified 
how well were the model parameters estimated by each model selection criterion. 

1. AR(1) Time Series Model Simulations 

As in Chapter II the initial simulation experiment uses the AR(1) model. 



Xt — pXr-i K €r (64) 

where r = indexes the time series, p is a constant coefficient, K is the model 

constant taken to be zero, and is N{0, ). As described in Chapter II the model is usually 

considered under the stationarity conditions (| p |< 1), but non-stationary processes such 
as random walks (| p | = 1) and explosive processes (| p | > 1), are also of interest. 

Again, two categories of experiments were conducted using the AR(1) time series 
model (equation 64). 
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The first experiment required the model selection criterion within MARS to esti- 
mate an AR(1) time series model from the simulated data using one lag predictor variable 
Xr-i, and using M = Z, the maximum number of subregions in the forward-step MARS 
procedure. The alternative models for the first experiment (to the AR(1) time series model) 
either have no Xr-i term (a constant model) or have an Xr-i term with an internal thresh- 
old value t greater than min{A'r-i}:J^^. 

The second experiment required the model selection criterion within MARS to 
estimate an AR(1) time series model from the simulated data when up to four lag predictor 
variables, are allowed and using Af = 8, the maximum number of subregions 

allowed in the forward-step MARS procedure. The alternative models for the second ex- 
periment include constant models, nonlinear time series models with at least one internal 
threshold value, and any time series model that includes a term other than Xr-i, i.e., lags 
r — 2, r — 3, or r — 4. 

Simulation experiments were performed for various combinations of p and for 
various v<dues of the smoothing parameter MS, the minimum number of data points between 
knots on the same predictor variable. Table 6 and Figures 40-45 show the simulation results 
for p = .5, .7, and .9 using af = A(0, 1), with a smoothing parameter of MS = .02A data 
points. Table 6 shows the number of simulations correctly identified as AR(1) time series 
models by each model selection criterion out of the 100 simulated AR(1) models for a given 
length of the simulated time series N. On the left and right side of the table are the results 
of the first and second experiments in which MARS attempted to identify the AR(1) time 
series model (64) from the simulated data using P — \ (left) and P = 4 (right) lagged 
predictor variables and using M = Z (left) and M = 8 (right), the maximum number of 
subregions allowed in the forward-step MARS procedure. 

Overall, the SC criterion performs the best at correctly identifying the simulated 
data as the AR(1) time series model for all values of A, while the number of correctly iden- 
tified models using GCV*, PC and AIC2 improves for increasing values of N and becomes 
comparable to the performance of the SC criterion. The number of models correctly iden- 
tified by AIC is low throughout the simulation experiment. Further investigation indicates 
that most of the incorrectly identified models developed by AIC included addition<J model 
terms, i.e., as discussed in Section B, AIC appears to overestimate the number of model 
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TABLE 6. AR(1) MODEL SIMULATION: The number of AR(1) simulations correctly identi- 

fied by each model selection criterion within MARS for increasing values of N. The 
model parameters are /> = .5 (top), .7 (middle) and .9 (bottom), and AT = 0, with 
(Tj = N{0y 1), and a minimum span of MS = .02(i\T) data points between knots. 
MARS attempted to identify the AR(1) model (64) from the simulated data using 
P = 1 (left) and P = 4 (right) lagged predictor variables and using M = 3 (left) 
and M = 8 (right), the maximum number of subregions allowed in the forward-step 
MARS procedure. Each simulation consists of 100 replications. Overall, SC is the 
best model selection criterion for correctly identifying the AR(1) simulations. Also, 
the number of correctly identified models by GCV*y PC and AIC2 improves for 
increasing values of N, The number of correctly identified models by AIC is low 
throughout the simulation experiment. 
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parameters. For example in the case that AIC was used to identify the AR(1) time series 
model with p = .5 (top of Table 6) from the simulated data using F = 1 and M = 3, 
only one of the simulations for N = 100 was identified as a constant model, the other 44 
misidentified models included at least an internal threshold value. This may be preferable 
to not identifying any structure at all. A model identified using the AIC criterion in this 
experiment may still closely approximate the output of the true underlying AR(1) time 
series model. Nevertheless, this is an indication that the AIC criterion over-parameterizes 
a proposed model. 

Figures 40-45 are a series of box plots for the estimated coefficients of the simu- 
lation models correctly identified as AR(1) time series models (as addressed in Table 6) by 
each model selection criterion within MARS for increasing values of N. For each value of 
N the boxplots represent the estimated model coefficients using, from left to right, GCV^, 
AIC, PC, SC and AIC2. The estimates for p are given in the top set of boxplots, and 
the estimates for K are given in the bottom set of boxplots. The true value of the model 
coefficients, p = .5 (Figures 40-41), p = .7 (Figures 42-43), and p = .9 (Figures 44-45), 
along with A' = 0 are identified by the dashed line across each of the boxplots. At the 
bottom of each boxplot is the length N of each simulated time series. By comparing the 
true values of the model coefficients and the boxplots of the estimated values of the model 
coefficients across increasing values of N, it is observed that the estimated coefficient values 
for each of the model selection criterion tend to the true value as N increases. 

2. Nonlinear Threshold Time Series Model Simulations 

To observe the ability of each model selection criterion within MARS to capture 
nonlinear threshold model characteristics, simulation of the 2-subregion threshold model 
(Tong, 1983) 



P\X-r—\ "I" if Xr—l ^ t 
P2 Xt-1 <T if Xt- 1 > t 



(65) 



was considered, where t = 1,2,..., A indexes the time series, pi and pj are constant 
coefficients, t = 0 and is N(0,<r^). As in Chapter II note that the nonlinear threshold 
time series model (65) has an ‘AR(l)-like’ model in each subregion, which implies that, with 
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X(t) * pX(r-i) + K + for T = 1,2, ,N 




Figure 40, AR(1) MODEL SIMULATION: Boxplots of the estimates from each model selection 

criterion for p = .5,AT = 0 when the model selection criterion within MARS correctly 
identified the data as from an AR(1) model (as reflected in Table 6). For increasing 
values of AT, MARS attempted to identify the AR(1) model (64) from the simulated 
data using P = 1 lagged predictor variables and M = 3, the maximum number of 
subregions allowed in the forward-step MARS procedure, with = AT(0, 1) and a 
nrinimum span of MS = .02(7^) data points between knots. Each simulation consists 
of 100 replications. The model selection criterion represented by the boxplots are, 
from left to right and for each value of N; GCV* j AlCy PCj SC and AIC2, The 
true value of the model coefficients, p = .5 and K = 0^ are identified by the dashed 
line across each of the boxplots. 
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- pX(T-i) + K + for T « 1,2. ,N 




Figure 41, AR(1) MODEL SIMULATION: Boxplots of the estimates from each model selection 

criterion for p = .5, = 0 when the model selection criterion within MARS correctly 
identified the data as from an AR(1) model (as reflected in Table 6). For increasing 
values of MARS attempted to identify the AR(1) model (64) from the simulated 
data using P = 4 lagged predictor variables and M = 8, the maximum number of 
subregions allowed in the forward-step MARS procedure, with = N{0, 1) and a 
minimum span of MS = ,02{N) data points between knots. Each simulation consists 
of 100 replications. The model selection criterion represented by the boxplots are, 
from left to right and for each value of N; GCV* , AIC, PC^ SC and AIC2. The 
true value of the model coefficients, p = .5 and = 0, are identified by the dashed 
line across each of the boxplots. 
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Figure 42. AR(1) MODEL SIMULATION: Boxplots of the estimates from each model selection 

criterion for p = = 0 when the model selection criterion within MARS correctly 

identified the data as from an AR(1) model (as reflected in Table 6). For increasing 
values of MARS attempted to identify the AR(1) model (64) from the simulated 
data using P = 1 lagged predictor variables and M = 3, the maximum number of 
subregions allowed in the forward-step MARS procedure, with = N{0^ 1) and a 
minimum span of MS = .02(N^) data points between knots. Each simulation consists 
of 100 replications. The model selection criterion represented by the boxplots are, 
from left to right and for each value of N; GCV* , AIC, PC, SC and AIC2. The 
true value of the model coefficients, p = .7 and iC = 0, are identified by the dashed 
line across each of the boxplots. 
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Figure 43. AR(1) MODEL SIMULATION: Boxplots of the estimates from each model selection 

criterion for /? = .7,Ar = 0 when the model selection criterion within MARS correctly 
identified the data as from an AR(1) model (as reflected in Table 6). For increasing 
values of iV, MARS attempted to identify the AR(1) model (64) from the simulated 
data using P = 4 lagged predictor variables and Af = 8, the maximum number of 
subregions allowed in the forward-step MARS procedure, with = ^'(0, 1) and a 
minimum span of MS = .02(iV') data points between knots. Each simulation consists 
of 100 replications. The model selection criterion represented by the boxplots are, 
from left to right and for each value of N; GCV*, AIC, PC^ SC and AIC2. The 
true value of the model coefficients, p = .7 and if = 0, are identified by the dashed 
line across each of the boxplots. 
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Figure 44. AR(1) MODEL SIMULATION: Boxplots of the estimates from each model selection 

criterion for p = .9,/^ = 0 when the model selection criterion within MARS correctly 
identified the data as from an AR(1) model (as reflected in Table 6). For increasing 
values of TNT, MARS attempted to identify the AR(1) model (64) from the simulated 
data using P = 1 lagged predictor variables and Af = 3, the maximum number of 
subregions allowed in the forward-step MARS procedure, with = iV(0,l) and a 
minimum span of MS = .02{N) data points between knots. Each simulation consists 
of 100 replications. The model selection criterion represented by the boxplots are, 
from left to right and for each value of N; GC7*, A7C, PC, SC and AIC2. The 
true value of the model coefficients, p = .9 and AT = 0, are identified by the dashed 
line across each of the boxplots. 
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X(t) - pX(T-i) + K + for t - 1,2 N 




Figure 45. AR(1) MODEL SIMULATION: Boxplots of the estimates from each model selection 

criterion for p = ,9,if = 0 when the model selection criterion within MARS correctly 
identified the data as from an AR(1) model (as reflected in Table 6). For increasing 
values of N, MARS attempted to identify the AR(1) model (64) from the simulated 
data using P = 4 lagged predictor variables and M = 8, the maximum number of 
subregions allowed in the forward-step MARS procedure, with = 7V(0, 1) and 
a minimum span of MS = .02(iV^) between knots. Each simulation consists of 100 
replications. The model selection criterion represented by the boxplots are, from left 
to right and for each value of N; GCV*, AIC^ PC^ SC and AIC2, The true value 
of the model coefflcients, p = .9 and AT = 0, are identified by the dashed line across 
each of the boxplots. 
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the assumed constant variance for €t in both regions, Xj can have a different variance in 
each of the two subregions. Also for a threshold at t = 0, the expected number of sample 
values in each subregion will be the same only if pi = —p 2 - 

Two categories of experiments were conducted using the threshold model. 

The first experiment required each model selection criterion within MARS to 
estimate a model from the simulated data of the nonlinear threshold time series model using 
one lag predictor variable Xj-i, and using Af = 4, the maximum number of subregions in 
the forward-step MARS procedure. The first experiment’s alternative models include the 
constcint model, linear autoregressive time series models, or nonlinear time series models 
that have more than one internal threshold. 

The second experiment required each model selection criterion within MARS to 
estimate a model from the simulated data of the nonlinear threshold time series model 
where up to four lag predictor variables, are allowed, and using M = 10, the 

maximum number of subregions allowed in the forward-step MARS procedure. The second 
experiment’s alternative models include the constant model, linear and nonlinear autore- 
gressive time series models with terms other than Xr-i (e.g. Xr- 2 ), or nonlinear time series 
models with more than one internal threshold value on Xr-i- 

Simulation experiments were performed for various combinations of pi and p 2 and 
for various values of the smoothing parameter MS, the minimum number of data points be- 
tween knots on the same predictor variable. Table 7 and Figures 46-49 show the simulation 
results for pi,p 2 = -8, .4, and —.6, .6, using <Tj = N(0,.5) with the smoothing parameter 
MS = .02{N) data points. Table 7 shows the number of simulations correctly identified 
«is threshold time series models by each model selection criterion for a given length of the 
simulated time series N. On the left and right side of the table are the results of the first 
and second experiments in which MARS attempted to identify the nonlinear threshold time 
series model (65) from the simulated data using F = 1 (left) and P = A (right) lagged 
predictor variables and using M = A (left) and Af = 10 (right), the 

Overall, the SC and AIC2 criteria perform the best at correctly identifying the 
simulated data as the simple nonlinear threshold time series model (65). For the first 
experiment (left), P = 1 and Af = 4, all the model selection criteria appear to perform 
equally well. In the second experiment (right), P = 4 and Af = 10, the SC and AIC2 
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TABLE 7. THRESHOLD MODEL SIMULATION: The number of threshold simulations cor- 
rectly identified by each model selection criterion within MARS for increasing values 
of N. The model parameters are pi, P2 = -8, A (top) and —.6, .6 (bottom), and t = 0, 
with <T^ = N{0^ .5) and a minimum span of MS = .02(J\T) data points between knots. 
MARS attempted to identify the AR(1) model (64) from the simulated data using 
P = 1 (left) and P = 4 (right) lagged predictor variables and using M = 4 (left) 
and M = 10 (right), the m 2 kximum number of subregions allowed in the forward- 
step MARS procedure. Each simulation consists of 100 replications. All the model 
selection criterion perform well in identifying the threshold simulations. 
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criteria perform the best followed closely by the PC and GCV* criteria as the values of 
N increase. However, the AIC criterion’s performance in the second experiment does not 
improve for increasing values of N and falls behind at correctly identifying the nonlinear 
threshold time series model (65). Again, as in the experiments with the AR(1) simulations, 
the majority of models incorrectly identified by the AIC criterion included additional terms 
i.e., AIC appears to overestimate the number of parameters in the model. 

Figures 46-49 are a series of box plots for the estimated coefficients of the sim- 
ulations correctly identified as threshold time series models (as addressed in Table 7) by 
each model selection criterion within MARS for incre«ising values of N. For each value of 
N, the boxplots represent the estimated model coefficients using, from left to right, GCV*, 
AIC, PC, SC and AIC2. The estimates for p\ (top), p 2 (middle) and i (bottom) are given. 
The true value of each model coefficient, pi,p 2 = -8, -4 (Figures 46-47), pi,p 2 = -.6, .6 
(Figures 48-49), and t = 0, are identified by the dashed line across each of the box plots. 
At the bottom of each boxplot is the length N of each simulated time series. It is observed 
that the estimated values of the model coefficients tend to their true value as N increases. 
Due to several outliers, the performance of PC and GCV* at estimating p 2 are initially 
disappointing. 

3. Summary of AR(1) and Threshold Model Simulations 

Overall, SC was the best criterion at identifying and selecting the model coeffi- 
cients from the simulated data of these simple AR(1) and nonlinear threshold time series 
models. The SC criterion wais consistent for aU values of the AR(1) and threshold model 
coefficients while the other criterion at times had difficTilty especially for small N (see e.g. 
Table 6 with p = .5). The performance of the SC criterion was followed next by the AIC2 
criterion and then the PC and GCV” criteria. In all cases with the exception of AIC, 
the number of correctly identified simulation models improved for increasing values of N. 
Also for increasing values of N, when the AR(1) or nonlinear threshold time series model 
was correctly identified, the precision of the estimates of the model coefficients improved 
for each model selection criterion. It was noted that when AIC incorrectly identified a 
model, it added additional terms to the model (in almost all cases), i.e., AIC was able to 
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Figure 46. THRESHOLD MODEL SIMULATION: Boxplots of the estimates for p\,p 2 = -8.4, 
and t = 0 when the model selection criterion within MARS correctly identified the 
data as a threshold model (as reflected in Table 7). For increasing values of iV, MARS 
attempted to identify the threshold model (33) from the simulated data using P = 1 
lagged predictor variables and M = 4, the maximum number of subregions allowed 
in the forward-step MARS procedure, with = iV'(0, .5) and a minimum span of MS 
= .02(iV) data points between knots. Each simulation consists of 100 replications. 
The model selection criterion represented by the boxplots are, from left to right and 
for each value of N; GCV*y AlCy PCy SC and AIC2, The true value of the model 
coefficients, pi = .8,^ = .4 and f = 0, are identified by the dashed line across each 
of the boxplots. 
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Figure 47. THRESHOLD MODEL SIMULATION: Boxplots of the estimates for pi,P 2 = .8.4, 
and f = 0 when the model selection criterion within MARS correctly identified the 
data :is a threshold model (as reflected in Table 7). For increasing values of N, MARS 
attempted to identify the threshold model (33) from the simulated data using P = 4 
lagged predictor variables and M = 10, the maximum number of subregions allowed 
in the forward-step MARS procedure, with = N{0, .5) and a minimum span of MS 
= .02{N) data points between knots. Each simulation consists of 100 replications. 
The model selection criterion represented by the boxplots are, from left to right and 
for each value of GCV*, AIC, PC, SC and AIC2. The true value of the model 
coefficients, pi = .8,/>j = .4 and < = 0, are identified by the dashed line across each 
of the boxplots. 
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Figure 48. THRESHOLD MODEL SIMULATION: Boxplots of the estimates for pi,P 2 = —.6.6, 
and t = 0 when the model selection criterion within MARS correctly identihed the 
data as a threshold model (as reflected in Table 7). For increasing values of AT, MARS 
attempted to identify the threshold model (65) from the simulated data using P = 1 
lagged predictor variables and M = 4, the maximum number of subregions allowed 
in the forward-step MARS procedure, with = iV(0, .5) and a minimum span of MS 
= .02(AT) data points between knots. Each simulation consists of 100 replications. 
The model selection criterion represented by the boxplots are, from left to right and 
for each value of N; GCK*, A/C, PC, SC and AIC2, The true value of the model 
coefficients, pi = — .6,/>2 = .6 and f = 0, are identified by the dashed line across each 
of the boxplots. 
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Figure 49. 



THRESHOLD MODEL SIMULATION: Boxplots of the estimates for />i , P2 = — 6.6, 
and t = 0 when the model selection criterion within MARS correctly identified the 
data as a threshold model (as reflected in Table 7). For increasing values of TV”, MARS 
attempted to identify the threshold model (65) from the simulated data using P = 4 
lagged predictor variables and M = 10, the maiximum number of subregions allowed 
in the forward-step MARS procedure, with = iV'(0, .5) and a minimum span of MS 
= .02(iV) data points between knots. Each simulation consists of 100 replications. 
The model selection criterion represented by the boxplots are, from left to right and 
for each value of N\ GCV* y AlCy PCy SC and AIC2. The true value of the model 
coefficients, p\ = — .6,/>2 = .6 and t = 0, are identified by the dashed line across each 
of the boxplots. 



137 



identify structure, although more than was actually present, which agrees with work done 
by Schwarz (1978). 

Another approach for determining the relative ability of each model selection 
criterion is to evaluate their performance at approximating the response variable (input) in 
terms of the explanatory variables (output). In this regard, the next section investigates 
the ability of each model selection criterion to approximate the fitted values and limit cycle 
from ASTAR Model 9 of the Wolf sunspot numbers. 

D. SIMULATIONS OF ASTAR MODEL 9 OF THE WOLF SUNSPOT 
NUMBERS 

As an illustration of the relative ability of each model selection criterion within MARS 
to closely approximate a representation of an actual time series we used the fitted values 
and limit cycle of ASTAR Model 9 of the Wolf sunspot numbers (66). In the first part 
of this section the Wolf sunspot numbers and the fitted values and limit cycle from AS- 
TAR Model 9 are briefly reviewed. (Recall that Chapter II discussed the use of MARS for 
modeling and prediction of the Wolf sunspot numbers, an actual time series with periodic 
behavior. The result was ASTAR Model 9, which when used for prediction was a consid- 
erable improvement over previous existing nonlinear models of the Wolf sunspot numbers.) 
Next, two simulations are used to examine the ability of each model selection criterion to 
closely approximate the fitted value and limit cycle time series of ASTAR Model 9 from the 
lagged values of each respective time series with additive N(0,1) noise. 

The sunspot data and the fitted values of ASTAR Model 9 (Figure 50) are quite 
‘periodic’ but have nonsymmetric cycles with extremely sharp peaks and troughs. The cycles 
generally vary between 10 and 12 years with the greater number of sunspots concentrated in 
each descent period versus the accompanying ascent period. The average (ascent/descent) 
period is (4.6/6.6) years for the sunspot number data and (4.5/6.4) years for the fitted 
values from ASTAR Model 9. The functional form of ASTAR Model 9 is 
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2.711 + .960X^_i + .332(47.0 -Xr-s)+ - .257(59.1 - X^_g)+ 

- .003Xr_i(Xr_2 - 26.0)+ + .017X^_i(44.0 - Xr_3)+ 

- .032Xr-l(17.1 - Xr-4)+ 

^ + .004Xt-i(26.0 — Xt-2)+(Xt-5 — 41.0)+ + Ct 



(66) 



where (x)+ is a plus function i.e., a function which takes value x if x > 0 and takes values 
0 otherwise, and Cr (from the analysis in Chapter II) is assumed to be Gaussian noise with 
zero mean and variance <7^. Model 9 has 14 parameters with 8 terms (a constant term 
with 3 one-way, 3 two-way and 1 three-way interactions) and 6 threshold values (1 each on 
Xt_ 2 j Xt_ 3 , Xt- 4 , and Xr -9 and 2 on Xt_s). Note that the MARS algorithm generating 
ASTAR Model 9 uses 20 lagged predictor variables that are permitted to form 1 , 2, and 3- 
way interactions during a maximum of M = 15 steps of the forward-step MARS algorithm. 
The minimum span between threshold knots is MS = 18 data points. 



• — Wolf Sunspot Numbers 




Figure 50. The yearly Wolf sunspot numbers (1700-1955) versus the fit of ASTAR Model 9 
(1720-1920). The yearly sunspot numbers (1700-1719) were used for initialization. 
The yearly sunspot numbers (1921-1955) were used to examine the prediction per- 
formance of ASTAR Model 9 and other models of the yearly sunspot numbers. 
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One of the interesting aspects of ASTAR models is their ability to create models with 
limit cycles from periodic-like data such as the Wolf sunspot numbers. A limit cycle may be 
thought of as a stationary state of sustained oscillations (Tong, 1985). Figure 51 shows the 
137 year limit cycle of ASTAR Model 9 of the Wolf sunspot numbers with its ascent and 
descent periods. The limit cycle for Model 9 is asymmetric with a range in amplitude of 17.7 
to 94.5 and an average ascent/descent period of 4.3/6.S years versus the 4.6/6.6 years for 
the actual yearly sunspot numbers from 1700 to 1920. In comparing Model 9’s limit cycle 
(Figure 51) with the real yearly sunspot data (Figure 50) note that the standard deviation 
of the fitted residual’s error variance is estimated as (MSS)^/^ = 10.69 sunspots. 



Ascent — 4. 5, 4, 5, 4, 5. 4, 4, 4, 4,4, 4, 5 




Figure 51 . The limit cycle for ASTAR Model 9 of the yearly Wolf sunspot numbers (1720-1920). 

The limit cycle is 137 years long with the indicated ascent and descent periods. The 
limit cycle is generated using ASTAR Model 9 initialized with the yearly sunspot 
numbers (1700-1719). The ‘subcycles’ have lengths of 10 or 11 years with 4 or 5 
years per ascent period and 6 or 7 years per descent period. 



1. Simulations of ASTAR Model 9 

Two different simulations using ASTAR Model 9 were developed to evaluate each 
model selection criterion within MARS. The first simulation experiment used the fitted 
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values of Model 9 (Figure 50) plus additive N(0,<T£) noise to examine the ability of each 
criterion to model and estimate a time series with rapidly changing structure. The second 
experiment used the limit cycle produced by Model 9 (Figure 51) plus additive N(0,o-j) 
noise to examine the ability of each criterion to model and estimate a time series with 
repetitive structure. The objective of these simulations are two fold; to observe how well 
each model selection criterion estimates the fitted values and the limit cycle of ASTAR 
Model 9 and how consistent are these estimates. Mean square error (MSE) was used as an 
overall measure of performance of each model selection criterion. 

a. Simulating the Fitted Values of ASTAR Model 9 

In this first experiment simulations of the fitted values of ASTAR Model 9 
of the Wolf sunspot numbers were considered. Using the fitted values of ASTAR Model 9 
to represent /(X), independent N(0,1) noise was added for the 221 year period from 1700 
to 1920. The resulting values were used as the time series for input to the MARS program. 
The program parameters in MARS during each simulation remained the same as those used 
to develop ASTAR Model 9; P = 20 lagged predictor variables, a maximum level of MI = 
3 interactions i.e., the models were permitted to form 1, 2, and 3-way interactions, and a 
minimum span between knots on a lagged predictor variable of MS = 18 data points. The 
data values from 1700 to 1719 were used for model initialization. A total of 50 simulations 
were performed for each different model selection criterion for different values of M that 
ranged from 5 to 30 (M is the maximum number of forward steps in the MARS algorithm). 
Note that a low value of M (e.g. 5 or 10) does not adequately permit a model selection 
criterion in MARS to find the structure of Model 9’s fitted values during the forward-step 
algorithm. In contrast, a high value of M (e.g. 25 or 30) does permit the model selection 
criterion to find the structure of Model 9’s fitted values and requires the backward-step 
algorithm to trim excess terms from the model. 

Table 8 and Figures 52 and 53 show examples of the results from the sim- 
ulations of the fitted values of ASTAR Model 9. For each value of M and each model 
selection criterion, the bias and variance for each estimate of the 201 (1720-1920) actual 
fitted sunspot numbers from ASTAR Model 9 was computed, using the results from the 50 
simulation runs. The values in Table 8 represent the average across r = 1, . . . ,201 for the 
absolute bias, variance and MSE of the estimates of the fitted values from ASTAR Model 9 
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using each model selection criterion. As expected, bias dominated the MSE due to the 
rapidly changing structure in the fitted values from Model 9. Also, as the models in MARS 
are permitted to become more complex (M increases) the bias decreases and, in general, 
the variance of the estimates of the fitted values from Model 9 increase. 

Figures 52 amd 53 axe plots of the bias [points] and the range of a 95% 
confidence interval centered about zero (±1.64<r/JV-®) [lines] for eau:h of the 201 estimates 
of the fitted values of Model 9 from the 50 simulations for each identified model selection 
criterion. In Figure 52 are the results for the model selection criteria AIC (left column) 
and GCV* (right column) with M = 10 (top) and Jlf = 25 (bottom). In Figure 53 axe the 
results of the model selection criteria for AIC (left column) and PC (right column) with 
M = 10 (top) and M = 25 (bottom). Note the difference in the size of the bias and the 
size of the confidence interval between the values of M in the top and bottom plots of each 
Figure, the bias for each estimate being, in general, smaller for M = 25 while the size of 
the confidence interval generally increases. Using Table 8 the AIC criterion is better than 
CCV for both Af = 10 and Af = 25 while AIC is better than PC for Af = 10 but they 
appear equivalent for M = 30. Looking between the plots of AIC and CCV‘ (Figure 52) 
for Af = 10 (top plots) note the high positive bias in several estimates of the fitted values 
using CCV* while for Af = 25 (bottom plots) note the spread of the Cl using CCV*. 
Looking between the plots of AIC and PC (Figure 53) for Af = 10 (top plots) note the 
high positive bias in several estimates of the fitted values using PC. By looking across and 
down in Figures 52 and 53 and using Table 8 it is observed that the MSE of each model 
selection criterion is improving for increasing values of Af although the rate of improvement 
decreases as Af increases. 

Using the simulation results from Table 8 and plots like those in Figures 
52 and 53, AIC is the best model selection criterion for estimating the fitted values of 
ASTAR Model 9 using MSE as the measure of performance. In Table 8, for each value of 
Af the average absolute bias and average MSE across r for the AIC criterion is, in general, 
lower than the other model selection criteria. Recall that AIC tends to over-parameterize 
a model, which may explain AICs performance for this experiment. The AIC criterion’s 
performance is closely followed by the PC, SC, and CCV* criteria. A/C2’s performance 
is extremely poor throughout the experiment. 
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TABLE 8. SIMULATION of the FITTED VALUES of ASTAR MODEL 9: The average across 
r = 1, . . . , 201 of the absolute bias, variance and MSE of the estimates for the fitted 
values of ASTAR Model 9 from each model selection criterion within MARS using 
50 simulations for increasing values of Af , the maximum number of forward-step 
subregions permitted in a MARS model. The MARS parameters for each of the 50 
simulations and each model selection criterion are P = 20, and = iV(0, 1) with a 
minimum span of MS = 18 data points between threshold knots. Each simulation 
consisted of estimating the 201 fitted values from ASTAR MODEL 9 of the Wolf 
sunspot numbers with additive N(0,1) noise. 



Average Absolute Bias 



M 


5 


10 


15 


20 


25 


30 


GCV 


12.07 


9.96 


8.93 


7.84 


7.23 


6.78 


AIC 


11.53 


9.66 


8.40 


7.61 


6.98 


6.48 


PC 


12.05 


9.96 


8.84 


7.73 


7.14 


6.49 


SC 


11.59 


9.92 


8.72 


8.01 


7.55 


7.34 


AIC2 


11.94 


10.81 


11.56 


11.54 


11.66 


11.77 



Average Variance 



GCV 


0.408 


0.750 


0.949 


1.040 


1.043 


1.024 


AIC 


0.428 


0.838 


0.849 


0.863 


0.875 


0.884 


PC 


0.399 


0.750 


0.942 


1.028 


0.988 


0.945 


SC 


0.423 


0.750 


0.905 


0.922 


1.002 


0.991 


AIC2 


0.480 


0.687 


0.929 


1.005 


0.978 


0.967 



AVERAGE MSE 



GCV 


245.5 


173.5 


132.7 


104.0 


87.6 


75.5 


AIC 


235.1 


158.4 


117.8 


96.8 


82.0 


70.9 


PC 


244.8 


173.5 


130.7 


101.1 


84.6 


69.7 


SC 


237.3 


172.6 


126.4 


107.6 


95.4 


89.9 


AIC2 


251.8 


239.0 


223.0 


220.6 


223.9 


228.5 
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Figure 52. SIMULATION of the FITTED VALUES of ASTAR MODEL 9: The bias (points) 
and a 95% confidence interval centered about zero (lines) for the estimates of the 
fitted values of ASTAR Model 9. The simulation experiment used 50 simulations 
of the 221 fitted values from ASTAR Model 9 of the Wolf Sunspot numbers with 
additive N(0,1) noise. The plots in this figure are for the AlC [left] and GCV* [right] 
model selection criteria using Af = 10 [top] and Af = 25 [bottom], the maximum 
number of subregions permitted in the forward step of the MARS algorithm. 
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Figure 53. SIMULATION of the FITTED VALUES of ASTAR MODEL 9: The bias (points) 
and a 95% confidence interval centered about zero (lines) for the estimates of the 
fitted values of ASTAR Model 9. The simulation experiment used 50 simulations 
of the 221 fitted values from ASTAR Model 9 of the Wolf Sunspot numbers with 
additive N(0,1) noise. The plots in this figure are for the AIC [left] and PC [right] 
model selection criteria using M = 10 [top] and M = 30 [bottom], the maximum 
number of subregions permitted in the forward step of the MARS algorithm. 
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b. Simulations of the Limit Cycle from ASTAR Model 9 

In the second experiment we considered simulations of the limit cycle de- 
veloped using ASTAR Model 9 of the Wolf sunspot numbers. Using (66) and the Sunspot 
numbers from 1700-1719 for initialization, Model 9’s limit cycle is created by driving the 
model without noise. The resulting 137 year limit cycle is shown at Figure 51. Using the 
limit cycle of ASTAR Model 9 to represent f{X), independent N(0,1) noise was added to 
431 limit cycle data values (3 cycles of 137 values and 20 values for initialization). The 
resulting values were used as the time series for input into the MARS program. The pro- 
gram parameters in MARS during earh simulation remained the same as those used to 
develop ASTAR Model 9; P = 20 lagged predictor variables, a maximum level of MI = 3 
interactions, i.e., the models were permitted to form 1, 2, and 3-way interactions, and a 
minimum span between variable partitions of MS = 18 data points. A total of 50 simula- 
tions were performed for each model selection criterion for different values of M that range 
from 5 to 15. Fewer number of forward steps M are required in this experiment due to the 
repetitiveness of Model 9’s limit cycle. 

Table 9 and Figures 54 and 55 show examples of the results from the second 
simulation experiment. For each value of M and each model selection criterion, the bias and 
variance for each estimate of the 411 limit cycle values was computed, using the results from 
the 50 simulation runs. The values in Table 9 represent the average across r = 1, . . . ,411 of 
the absolute bias, variance and MSE of the estimates of the limit cycle values from ASTAR 
Model 9 using each model selection criterion. The bias again dominates the MSE although 
it is not as significant as the bias in the estimates of the fitted values from ASTAR Model 9 
(Table 8). Note again that as the models in MARS are permitted to become more complex 
{M increases) the bias decreases. Also, the variance of the estimates for the limit cycle 
values are, in general, slowly decreasing. 

Figures 54 and 55 are plots of the bias [points] and the range of a 95% 
confidence interval centered about zero (±1.64 (t/A'®) [lines] for each of the 411 estimates 
of ASTAR Model 9’s limit cycle using the 50 simulations and the identified model selection 
criterion. Figure 54 shows the results for AIC (left column) and GCV* (right column) with 
M = 5 (top) and Af = 15 (bottom). Figure 55 shows the results for PC (left column) and 
GCV* (right column) with Af = 10 (top) and Af = 30 (bottom). Again, note the difference 
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TABLE 9. SIMULATION of the LIMIT CYCLE VALUES of ASTAR MODEL 9: The average 
across r = 1, . . . ,411 of the absolute bias, variance and MSE of the estimates for the 
limit cycles values from ASTAR Model 9 for each model selection criterion within 
MARS using 50 simulations and increasing values of M , the maximum number of 
forward-step subregions permitted in a MARS model. The MARS parameters for 
each of the 50 simulation are P = 20, = iV(0, 1), with a minimum span of MS = 18 

data points between model threshold knots. Each simulation consisted of estimating 
411 values of ASTAR Model 9’s limit cycle with additive N(0,1) noise (3 limit cycles 
of 137 data values). 





Average Absolute Bias 


Average Variance 


Average MSE 


M 


5 


10 


15 


5 


10 


15 


5 


10 


15 


GCV 


2.42 


1.73 


1.46 


0.12 


0.11 


0.10 


9.41 


4.98 


3.59 


AIC 


2.21 


1.61 


1.44 


0.14 


0.10 


0.09 


8.10 


4.32 


3.48 


PC 


2.42 


1.74 


1.47 


0.12 


0.11 


0.10 


9.41 


5.00 


3.63 


SC 


2.21 


1.63 


1.63 


0.14 


0.10 


0.10 


8.03 


4.45 


3.87 


AIC2 


2.21 


1.94 


1.82 


0.14 


0.13 


0.14 


7.95 


5.83 


5.21 



in the size of the bias and the size of the confidence interval between the values of M in the 
top and bottom plots of each Figure; the bias and the variance for each estimate being, in 
general, smaller for M = 15. By looking across and down in Figures 54 and 55 and using 
Table 9 it is again observed that the MSE of each model selection criterion is improving for 
increasing values of M although again the rate of improvement decreases as M increases. 

Using the simulation results from Table 9 and plots like those in Figures 54 
and 55, AIC is the best model selection criterion for estimating the limit cycle values of 
ASTAR Model 9 using MSE as the measure of performance. The AIC criterion’s perfor- 
mance is closely followed by the 5C, PC and GCV*. A/C2’s performance, initially good 
at M = 5, is again poor for increasing values of M . 

c. Summary of ASTAR Model 9 Simulations 

The AIC criterion performed very well for the simulations of the fitted values 
and the limit cycle of ASTAR Model 9. The performance of the AIC criterion was followed 
closely by the SC, PC and GCV* criteria, with SC initially doing better for lower values 
of M. Overall, the AIC2 criterion performed poorly throughout the experiment. 
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Figure 54. SIMULATION of the LIMIT CYCLE from ASTAR MODEL 9: The bias (points) 
and a 95% confidence interval centered about zero (lines) for the estimates of ASTAR 
Model 9’s limit cycle. The simulation experiment used 50 simulations of 411 values 
of the limit cycle developed from ASTAR Model 9 of the Wolf Sunspot numbers with 
additive N(0,1) noise. The plots in this figure are for the AIC [left] and GCV* 
model selection criteria using Af = 5 [top] and Af = 15 [bottom], the maiximum 
number of subregions permitted in the forward step of the MARS algorithm. 
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Figure 55. SIMULATION of the LIMIT CYCLE from ASTAR MODEL 9: The bias (points) 
and a 95% confidence interval centered about zero (lines) for the estimates of ASTAR 
Model 9’s limit cycle. The simulation experiment used 50 simulations of 411 values 
of the limit cycle developed from ASTAR Model 9 of the Wolf Sunspot numbers with 
additive N(0,1) noise. The plots in this figure are for the PC [left] and GCV* [right] 
model selection criteria using M = 5 [top] and M = 15 [bottom], the maximum 
number of subregions permitted in the forward step of the MARS algorithm. 
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E. SMASTAR MODELING OF THE VATNSDALSA RIVERFLOW USING 



MARS 3.0 

In Chapter’s II and III the GCF* criterion in MARS 2.0 was used to develop models of 
the Sunspot numbers and the Vatnsdalsa riverflow system. Chapter IV discussed the MARS 
3.0 program and its modifications to facilitate time series analysis. One of the modifications 
is an input variable, MSC, that allows the selection of different model selection criteria for 
use in the MARS algorithm. In this section the objective is three-fold; first to discuss 
the use of the model selection criteria investigated in Sections C and D of this chapter 
in conjunction with the new capabilities of MARS 3.0, second to determine if the model 
selection criterion in MARS 3.0 can be used to improve Model ICE486 of the Vatnsdalsa 
riverflow system, and finally to study the performance of the model selection criterion in 
a more difficult setting. Note that the AIC2 criterion was not investigated in this section 
due to its poor performance in Section D of this chapter. 

Using MARS 3.0 and the GCV*, AIC, PC and SC model selection criteria, 2 Year 
SMASTAR Models of the Vatnsdalsa Riverflow were developed using 731 days (2 years) of 
riverflow for model development and the remaining 355 days for prediction. Models using 
each criterion were permitted to form 1, 2, and 3- way interactions during a maximum of 
M = 25 and 50 forward steps of the MARS algorithm as compared to the 10 to 20 forward 
steps permitted during the Vatnsdalsa riverflow modeling discussed in Chapter HI. The 
minimum span between threshold values for a single predictor variable was 50 and 75 values. 
The SMASTAR models were developed using a total of 27 predictor variables (9 lagged 
predictor variables for each time series); lagged riverflow Xr-i to A’r-g, lagged precipitation 
Yt -1 to Yr-s, with and without the natural log transformation Y*_^ = /n (1 -|- Yr-{), and 
lagged temperature Zr-i to Zr- 9 - The first 9 data values of each time series were used for 
initialization. 

Analysis of the fitted values and residuals of the models selected by the GCV*, AIC 
and PC criteria indicate that these model selection criteria tend to create very large models 
for the riverflow if the number of forward steps of the MARS algorithm is set high i.e., when 
compared to the SC criterion, the GCV", AIC and PC criteria do not eliminate many terms 
from the SMASTAR model during the backward step of the MARS algorithm. One result 
of these large models is that the fitted residuals tend to have significant autocorrelation. 
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possibly due to overfitting. Also, when used for prediction, models developed using the 
GCV, PC and AIC criteria tend to have unpredictable results i.e., violent changes in 
behavior that at times can lead to negative riverflow (again this may be due to overfitting). 

Use of the GCV*, PC and AIC criteria necessitate judicious use of the model pa- 
rameter that sets the maximum number of forward steps in the MARS algorithm. For 
example. Model ICE486 developed in Chapter III using the GCV* criterion had 13 model 
terms during only 15 forward steps of the MARS algorithm. Yet in this experiment a model 
developed using GCV* had 44 model terms during Af = 50 forward steps of the MARS 
algorithm. When compared to the models developed using the SC criterion, the GCV* 
criterion appears to over parameterize the SMASTAR model. 

This experiment also indicates that the final size of the model is due, in part, to the 
relationship between the two parts of each model selection criterion (model complexity and 
model lack-of-fit). Recall that the apparent model over-parameterization of linear time 
series models by the AIC criterion led to the development of the SC criterion (Schwarz, 
1978), which increases the weight of the model complexity function by a factor of .5 ln{N). 
Thus for a given value of N, to add a term to a SMASTAR model and improve (decrease) 
the SC criterion’s ‘score’ requires a greater decrease in the model’s lack-of-fit than required 
using the AIC criterion. 

Equation (67) details SMASTAR Model ICE SC160 for the Vatnsdalsa riverflow for 
the years 1972 and 1973, developed using the SC model selection criterion. Model ICE 
SC160 for the Vatnsdalsa riverflow uses the natural log transformed precipitation and was 
permitted to form 1, 2, and 3-way interactions during a maximum of Af = 50 forward steps 
of the forward step MARS algorithm. The minimum span between threshold values for a 
single predictor variable was 50 data values. SMASTAR Model ICE SC160, which should 
be compared to SMASTAR Model ICEi486 (equation 54), is 
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+ 0.1940(Xr-i - 3.98)+(Xr-2 - 8.36)+ 

- 0.0003(X^_i - 3.98)+(X^_2 - 8.36)+ (Xr-6 - 3.98)+ 

- 4.572(.i82 - r;_2)+ 

+ o.574y;_i(y;_4 - i.s3)+ 

- 0.662(A'r-3 - 9.02)+(1.53 - Y;_^)+ 

+ 1.211(Xr-i - 3.98)+(y;_2 - .262)+ 

- 1.321(Xr-i - 3.98)+ (y ;_2 - 1.07)+ 

+ 1.035(Xr_i - 3.98)+(1.07 - 

- 0.145(Xt-i - 3.98)+(1.07 - y;_2)+(y;_4 - l-03)+ (67) 

+ 0.085(Xt-i - 3.98)+(8.36 - Xr-2)^Y;_^ 

+ 0.023{Xr-i - 3.98)+(X^_2 - 8.36)+(.262 - y;_g)+ 

+ .0146(Xr-i - 3.98)+ (3.00 - Zr-i)+ 

- .0035(Xr-i - 3.98)+(Xr_2 - 3.98)+(3.00 - Zr-i)+ 

- .0m{Xr-i - 3.98)+(Z^_i - 3.00)+(4.80 - Zr-7)+ 

+ .0084(Xt_i - 3.98)+(Z^_i + 1.60)+(3.50 - Zr-e)+ 

- .0033(Xt_i - 3.98)+(X^_3 - 3.98)+(Z^_i + 1.60)+ 

- .0081(X^_i - 3.98)+(Xr-2 - 8.36)+(Z^_2 + 22.4)+ 

Model ICE SC160 (Figure 56) has 32 parameters that includes 19 terms (a model 
constant term and 1 one-way, 7 two-way and 10 three-way interactions) and 13 threshold 
values (1 each on Xt- 2 , Xr- 3 , Xr- 4 , Y’_g, Zr-e, Zt-t), 2 on y ,.*_4 and Zr-i and 3 on Y*_ 2 . 
The standard error of the fitted residuals is <Tc = l.lOm^/scc. for Model ICE SC160 versus 
<Tt = \.21m^ j sec. for Model ICE496 developed using GCV* . Figure 56 shows plots of the 
fitted values and residuals of Model ICE SC160 for the Vatnsdalsa riverflow data during 
1972 and 1973. 

Model ICE SC160, Figure 56, appears to equally overfit and underfit the peaks and 
troughs as it captures the general structure of the riverflow data. Analysis of the normal 
probability plot (not shown) shows that the fitted residuals are still slightly skewed with 
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Figure 56, The Vatnsdalsa riverflow data for years 1972 and 1973 versus the fitted values (top) 
and residuals (bottom) for SMASTAR Model ICE SC160. The SMASTAR model for 
the riverflow at time r, Xr , was a function of lagged riverflow Xr-i to Xr - 9 ^ lagged 
precipitation Y*_i to i.e., the natural log transformation Y*_^ = /n(l + Yr-,*), 

and lagged temperature Zt-i to Zt- 9 . Model ICE SC160 has 32 parameters that 
includes 19 terms (a model constant term and 1 one-way, 7 two-way and 10 three- 
way interactions) and 13 threshold values (1 each on Xr- 2 y ^t- 3, Y/_g, Zt_6, 

Zr- 7 ), 2 on Y /_4 and Zt-i and 3 on Y/_ 2 - The standard error of the fitted residuals 
is ffc = l.lOm^/sec. The initial nine values of each time series were used to initialize 
the model. 
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the extremely heavy tails that have occurred with previous modeling efforts of this type 
riverflow data. The fitted residual autocorrelation function and estimated normalized pe- 
riodogram plots are shown at Figure 57. As with Model ICE486 using GCV* (Figure 31) 
the autocorrelation function for the fitted residuals reveals no evidence of short term au- 
tocorrelation. Also, as with Model ICE486, we could consider the residuals independent if 
they were normally distributed because the normalized cumulative spectrum of the fitted 
residuals falls entirely within the 90% K-S bounds from the cumulative spectrum for white 
noise. However, again the fitted residuals display a pattern of high residual values during 
periods of high riverflow (Figure 56), evidence of the non-normality of the fitted residuals. 

To investigate the predictive performance of Model ICE SC160, developed and dis- 
cussed above. Model ICE SC160 and the riverflow, precipitation and temperature data 
during the year 1974 was used to perform a 1 day forward-step ahead prediction of the 
Vatnsdalsa riverflow. Overall the predictions of Model ICE SC 160 are only slightly differ- 
ent than the 1 day forward-step ahead predictions of Model ICE486 using GCV”. Figures 
58-59 contain plots of the actual versus 1 day forward-step ahead predictions of Model ICE 
SC160 and the fitted residuals for the Vatnsdalsa riverflow during the year 1974. Again, the 
1 day forward-step ahead predictions were performed using coefficient updating and a fixed 
coefficient model. In both cases the model predictions react very well to both the extreme 
spring transition and low riverflow that occurs later in the year. However, as expected 
the 1 day forward-step adiead predictions using coefficient updating (Figure 58) are an im- 
provement over the 1 day forward-step ahead predictions using the fixed coefficient model 
(Figures 59). The standard error of the fitted residuals using coefficient updating is = 
2.08 m^/sec. for Model ICE SC160 using SC versus (t^ = 2.11 m^/sec. for Model ICE486 
using GCV*. The standard error of the fitted residuals using the fixed coefficient model is 
(Tc = 2.67 m^/sec. for Model ICE SC160 using SC versus <Tt = 2.37 m^/sec. for Model 
ICE486 using GCV*. The predictive capability of the two models is similar. Note that 
Model ICE SC160 has a slightly smaller fitted residual variance than Model ICE486 for the 
coefficient updating method while the opposite is true for the fixed coefficient method. How- 
ever, Model ICE486 was developed in a restrictive environment with only Af = 15 forward 
steps of the MARS algorithm while Model ICE SC 160 was developed in an unrestrictive 
and thus preferable environment with M = 50 forward steps of the MARS algorithm. 
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Figure 57. Fitted Residual Plots from SMASTAR Model ICE SC160. The autocorrelation func- 
tion (first 20 lags) [top] and the normalized cumulative periodogram [bottom] of the 
fitted residuals from SMASTAR Model ICE SC160 of the Vatnsdalsa River system 
for the period 1972-1973. The top plot, with approximate 95% individual confi- 
dence bounds, shows that no apparent autocorrelation exists in the fitted residuals. 
Also the K-S bounds in the normalized cumulative periodogram plot indicates no 
departure from a flat spectrum, so that if the residuals are normally distributed, the 
residuals are independent. 
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Figure 58. The actual versus 1-step ahead predictions and errors from MODEL ICE SC160 for 
the Vatnsdalsa riverflow data (1974) with coefficient updating (coefficient update). 
The standard error of the fitted residuals = 2.08 m^fsec. for Model ICE SC160 
versus = 2.11 m^/sec. for Model ICE486 using GCV* . 
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Figure 59. The actual versus 1-step ahead predictions and errors from MODEL ICE 160 for the 
Vatnsdalsa riverflow data (1974) without coefficient updating (fixed model). The 
standard error of the fitted residuals = 2.67 m^/aec. for Model ICE SC160 versus 

<Te = 2.36 m^/sec. for Model ICE486 using GCV. 
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F. SUMMARY 



This chapter examined the problem of model dimension and variable selection when 
using adaptive regression splines to develop a nonlinear autoregressive model for a univariate 
or semi-multivariate time series system. Five model selection criteria, GCY*, AIC, AIC2, 
PC and SC, were examined to determine which performed best within MARS. The results 
indicate that SC is the best model selection criterion for use in MARS for a time series 
setting. The SC criterion (Schwarz, 1978; Rissanen, 1978) consistently performed well for 
all experiments conducted in this chapter and appears to best accommodate the forward and 
backward stepwise MARS strategy for model development. In contrast, the AIC criterion 
appeared to over parameterize models which agrees with findings by Schwarz (1978) and 
others. Model over-parameterization was also a characteristic of the PC and GCV* criteria 
when modeling the Vatnsdalsa riverflow in an ‘unrestricted’ environment (large M, the 
number of forward steps in the MARS algorithm). The AIC2 criterion performed very 
poorly during simulations of the fitted values and limit cycle of ASTAR Model 9 of the 
Wolf sunspot numbers. 

Thus the SC criterion is recommended for model selection when MARS is applied in 
a time series setting. 
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VI. THESIS SUMMARY 



MARS is a new nonparametric regression modeling methodology, due to Friedman, 
that utilizes low-order regression spline modeling and a modified recursive partitioning 
strategy to exploit the localized low-dimensiontJ behavior of the data used to construct 
/(x). Given a set of predictor variables, MARS fits a model in the form of an expansion in 
product spline basis functions of predictors chosen during a forward and backward recursive 
partitioning strategy. Although MARS is a computationally intensive regression modeling 
methodology, it provides a systematic (automatic) method for deriving nonlinear threshold 
models for high-dimensional data. The MARS models are naturally continuous in the 
domain of the predictor variables, and can have multiple partitions and predictor variable 
interactions. 

Within MARS by letting the predictor variables for the rth value in a time series 
{Xr} be its lagged values, i.e., Xr-i, A 7 — 2 , •••, X^-p, one obtains an adaptive spline 
threshold autoregressive (ASTAR) model, a new method for systematic nonlinear modeling 
of time series that extends the threshold autoregressive (TAR) model due to Tong (1985). 
Simulations of autoregressive and nonlinear threshold models are used to show the ability 
of ASTAR to model simple time series. A significant feature of ASTAR models when 
modeling time series data with periodic behavior is its ability to produce continuous models 
with underlying sustained oscillations (limit cycles). The initial analysis of the yearly Wolf 
sunspot numbers (1700-1890) and (1700-1920) using ASTAR produced several models with 
underlying limit cycles. When used to predict the yearly sunspot numbers (1921-1955), the 
ASTAR models are a significant improvement over existing Threshold and Bilinear models. 

Within MARS by letting the predictor variables be not only the lagged values of the 
time series being modeled and predicted, but also the lagged values of other related time 
series, results in a semi-mTiltivaxiate adaptive spline threshold autoregressive (SMASTAR) 
model. This investigation indicates that SMASTAR models appear well suited for tak- 
ing into account the complex nonlinear interactions among multivariate, cross-correlated, 
lagged predictor variables of a time series system. Using the Vatnsdcdsa riverflow system 
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as an example, Tong et al. (1985) showed that normal linear autoregressive models were 
incapable of modeling the complexities for this type time series system. Also the method- 
ology and structure of semi-multivariate TAR models appear incapable of capturing these 
complexities in a parsimonious model. However, the SMASTAR model appears to consider 
the complex relationships between the cross-correlated predictor variables, and seems ca- 
pable of providing semi-multivariate nonlinear time series models for prediction, even in 
non-normal situations such as riverflow data. SMASTAR model development, although 
computationally intensive, is also quite systematic. 

An important aspect of any overall regression modeling effort is the interpretation and 
analysis of a regression model. However, the functional form of an ASTAR model, with 
its combination of different predictor variables and multiple threshold values, makes its 
straightforward interpretation and analysis difficult. In this regard a graphical represen- 
tation was developed to permit the interpretation and analysis of ASTAR models. It was 
shown that this graphical representation can be used to analyze the use for and contribution 
of each of the terms in an ASTAR model. The extension of this graphical representation 
to SMASTAR models is difficult. However, it was shown that the tree-like structure of a 
MARS model can be used to analyze the use for and contribution of each of the terms in a 
SMASTAR model. 

The current model selection criterion in MARS is GCV*y a modified form of generalized 
cross validation. However, other model selection criterion, such as Akaike’s Information 
Criterion (A/C), have been suggested for model selection in the standard linear time series 
setting. In this regard, simulations were used to investigate GCV* and several other model 
selection criterion for use within MARS. The results indicate that the Schwarz- Rissanen’s 
SC criterion and Amemiya’s PC criterion improve the model selection over GCV* when 
MARS is used in a time series setting. The simulation experiments identified the potential 
for over parameterization by A/C that has been identified by Schwarz and others. The PC 
and GCV* criteria also appeared to create unnecessarily large models (lack of parsimony) 
when used for model selection of the Vatnsdalsa riverflow system. Thus in a time series 
setting the SC criterion is the recommended model selection criterion for use within MARS. 

The application of MARS for nonlinear modeling of univariate and semi-multivariate 
time series systems is a new and exciting methodology. However, there is still a need for 
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additional investigation of many aspects of this application. Additional comments, questions 
and areas for further research include; 

1. It is important to note that as with any regression or time series modeling effort, one 
can never be sure that one has all the relevant predictor variables. However, MARS 
version 3.0 has been modified and Fortran programs written to permit analysis of 
univariate and multivariate time series systems. The modifications include the ability 
to select from one of several model selection criterion that have been proposed for use 
in a time series setting. What other structural and methodological modifications are 
needed to improve nonlinear modeling of time series using MARS? 

2. As discussed in Chapter IV the MARS 3.0 program pernoits the use of categorical 
variables such a.s wind velocity or circular wind direction. The use of categorical 
variables in SMASTAR time series models has not been investigated. However, this 
time series modification to MARS appears to provide an innovative approach for 
including influential lagged categorical variables. 

3. A constant source of concern when modeling time series data is that of variance 
homogeneity and independence of the error term in the model 

Xf — f {^Xf—\^ • • • > Xf—p) "I" 

Residual analysis of the yearly sunspot data model supported this assumption of 
homeoscedascity and independence. If not, an initial attempt to overcome this would 
be, for positive data, to use a log transformation as was done for the precipitation 
data from the Vatnsdalsa riverflow system. However, there is no guarantee that this 
would work; for instance if the data were generated by an ARCH model (Tong, 1990 
pp. 116-117) then a simple transformation of the data, such as a log transformation, 
would not work. Other questions involve normality of the errors. 

4. MARS selects a model using exhaustive search and is a computer intensive method- 
ology. Note that it is quite systematic and interactions and thresholds are selected 
by the stepwise methodology. Also some of the predictor variables used as candidates 
for the model may be rejected during the forward step of the MARS algorithm, i.e, 
MARS uses subset selection of the available predictor variables. Tsay (1989) has de- 
veloped procedures for threshold variable selection and a statistic to test threshold 
values within the framework of TAR. These statistics, and graphical methods that 
parallel the methods discussed for the Wolf sunspot numbers in Chapter II need to 
be incorporated, in a fairly automatic way, in the MARS methodology for time series 
analysis. 

5. Can MARS be used to measure the degree of nonlinearity of a time series system? 

6. An issue of concern in the general application of MARS is the method for determin- 
ing the ‘correct’ number of degrees-of-freedom to charge for variable and threshold 
value selection. This issue is even less clear across the lagged predictor variables of a 
univariate or semi-multivariate time series system. 
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7. As discussed in Chapter IV there are several model parameters that must be set to 
initialize the MARS algorithm. It is preferable to set the parameters to be as unre- 
strictive as possible and permit the model selection criterion and the data to determine 
the final model form. In this regard, the guidance (obtained mostly by practical ex- 
perience) offered by Friedman (1991) for setting the MARS model parameters in the 
general setting appears very robust. However, the practical application of MARS to 
time series has been limited to this investigation. 

8. The use of ASTAR and SMASTAR models for modeling and analysis followed by the 
simulation of complex, nonlinear systems is discussed in Lewis and Stevens (1990). 
Frequently, individual inputs of complex global system models are the result of the 
analysis of subcomponent systems. If the subcomponent systems are time series sys- 
tems then the reduction of these nonlinear time series systems to a tractable model 
form such as provided by ASTAR and SMASTAR time series models may enhance 
the efficiency and accuracy of global system inputs. 
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APPENDIX A. FORTRAN BATCH FILE FOR DEVELOPING 
ASTAR AND SMASTAR TIME SERIES MODELS USING THE MARS 

3.0 PROGRAM 



QECHO OFF 

This file is marstsa.bat. 01 July 1991 

*i|t**itiiti*** ****** ***4i4i***********************4<4<4<4<*4‘4‘4‘4‘4‘4‘4‘***4‘*it<it<** 

PC BATCH FILE FOR FOR INITIATING HARS3.0a TIME SERIES RUNS FOR 
MICROWAY NDP FORTRAN 2.1.4 UNDER DOS USING THE WEITEK 
COPROCESSOR. CHANGE THE -N4 PARAMETER TO -N2 IN THE MARCL.BAT 
FILE AND RECOMPILE THE MARS3.0A, MARSBLDa AND MARSDRVa FORTRAN 
PROGRAMS TO USE THE INTEL OR CYRIX COPROCESSORS. Does not run 
under Ver 2.0.6 of MICROHAY NDP FORTRAN, and has not been tried 
under their later versions. At least Version 3 exists, maybe 
Version 4. 

***************************************************************** 
J. STEVENS - L. URIBE - P.A.W. LEWIS. e-mail 1526P«NAVPGS.BITNET 
***************************************************************** 



THIS EXEC PREPARES THE INPUT REGRESSION MATRIX FOR FRIEDMAN’S 
MARS3.0a OUT OF 1, 2 OR 3 TIME SERIES USING THE MARSBLDa FORTRAN 
PROGRAM. AFTER THAT IT CALLS THE MARSDRVa FORTRAN PROGRAM WHICH 
PERFORMS THE MARS REGRESSION, FIRST COMPUTING ALL THE ARRAY 
SPACE ALLOCATIONS NEEDED IN MARS3.0a IN AN AUTOMATED WAY. THIS 
RELIEVES THE USER FROM THE BURDEN OF SUCH GUESSWORK. THE ONLY 
VALUE THAT HAY NEED ADJUSTMENT FROM THE USER IS ON THE 2ND LINE 
OF MARSDRVa FORTRAN PROGRAM, WHERE THE SIZE FOR THE PARAMETER NV 
APPEARS. THIS PARAMETER IS USED TO INCREASE OR REDUCE THE 
AMOUNT OF MEMORY AVAILABLE FOR MARS3.0a. IN SUCH CASE MARSDRVa 
NEEDS TO BE RECOMPILED PRIOR TO RUNNING THIS BATCH EXEC. 

PROGRAM MARSBLDa PROMPTS FOR THE NAMES OF UP TO 3 TIME SERIES FILE 
NAMES. THE 1ST ONE HAS THE TIME SERIES BEING PREDICTED, FROM LAGGED 
VALUES OF ITSELF AND it must always be PRESENT. THE OTHER 2 TIME 
SERIES ARE OPTIONAL PREDICTOR TIME SERIES. PRESS THE 
<ENTER> KEY ALONE WHEN ANY OF THESE 2 TIME SERIES IS NOT USED. 

OUTPUT RESULTS FROM MARS3.0a AND AN ADDITIONAL INTERACTIONS REPORT 
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APPEARS ON FILE UNIT 06 (HARS3.0UT) 



INPUT FILE(S) FORMAT: (See Definitions Below) 

RECORD 1: N,P,MI,NK,NGC,NGS,M,ICX,MS,DF,MSC 

RECORD 2: LX(I), I=1,P 

RECORD 3: LAG(I) , 1=1, P 

RECORDS 4-END: TIMES SERIES VALUES 

ALL THE ABOVE INFORMATION IS ENTERED IN FREE FORMAT (JUST ONE OR 
MORE SPACES BETWEEN VALUES). RECORD 2 AND RECORD 3 CAN BE MULTIPLE 
RECORDS THEMSELVES WHEN P IS LARGE. THE ARRAYS CAN BE ENTERED 
FOR EXMPLE 20 VALUES PER LINE. 

PARAMETER DEFINITIONS: 

N=NO. OF VALUES IN THE TIME SERIES (ALL 3 MUST BE EQUAL) 

P«NO. OF PREDICTORS FROM THIS TIME SERIES 
MI»MAX. NO OF INTERACTIONS 
NK=MAX. NO. OF BASIS POINTS 

NGC=NO. OF RASTER POINTS FOR PLOTTING (SET TO 0 FOR NO PLOT) 
NGS=NO. OF R.P. ON EACH AXIS FOR PLOTS ( " ) 

M=MODEL FLAG: 1=PL0T PIECEWISE LINEAR, 2=PL0T PIECEWISE CUBIC 
ICX=CONVEX HULL FLAG: 0=PLOT SURFACE OVER ENTIRE RANGE OF ARGS. 

>0=PL0T SURF. OVER INSIDE CONVEX HULL 
MS«MIN. SPAN (MIN NO. OBSERVATIONS BETWEEN KNOTS) 

DF=NO. OF DEGREES OF FREEDOM 

MSC=MODEL SELECTION CRITERIA (1=GCV, 2=AIC, 3»PC, 4=SC) 

ENTER 0 FOR THOSE PARAMETER VALUES NOT APPLICABLE TO A GIVEN RUN. 

LX=PREDICTOR VAR. FLAG: 0=EXCLUDE VARIABLE FROM MODEL 

1«0RDERABLE VARIABLE. NO RESTRICTION 
2=0RDERABLE VAR. ADDITIVE. NO INTERACTS. 
3=0RDERABLE VARIABLE LINEAR ONLY. 
-1*CATEG0RICAL VAR. NO RESTRICTION. 
-2*CATEG0RICAL VAR. ADDITIVE. NO INTERACTS 
LAG-LAGS TO USE TO GENERATE PREDICTORS FROM THIS TIME SERIES . 

TIME SERIES VALUES: THEY FOLLOW IN FREE FORMAT, AS MANY PER 

RECORD AS DESIRED. 

THE TOTAL NO. OF PREDICTORS IS THE SUM OF THE PREDICTORS FOR EACH 
INPUT TIME SERIES. 
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NOTE THAT ALL VALUES ON RECORD 1 OF THE 3 FILES MUST BE THE SAME 
EXCEPT FOR P THE NUMBER OF PREDICTORS. 

BREAK ON 
CECHO On 

SET SAVPTH=y,PATH*/. 

Save old Path, and below create one for the MARS zim. 

PATH c : \ : c : \dos401 ; D : \ ; d : \NDP20 ; d : \NDP20\MARSNEH 
RUN386 MARSBLDa 
RUN386 MARSDRVa 
PATH=y.SAVPTH*/. 
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APPENDIX B. NDP FORTRAN PROGRAM FOR BUILDING THE 
INPUT TO THE MARS 3.0 PROGRAM FOR ASTAR AND 
SMASTAR TIME SERIES MODEL DEVELOPMENT 



c This is the MARSBLD.F Fortran Program 01 July 1991 

c ***************************************************************** 
c J. STEVENS - L. URIBE - P.A.W. LEWIS. e-mail 1526PCNAVPGS.BITNET 

c ***************************************************************** 

C — BUILD THE STD INPUT DATA FILE FOR MARSDRV WITH 1 TO 3 SERIES 

C CALLS FOR UP TO 3 INPUT FILES, WHOSE FORM IS GIVEN IN THE 

C MARSTSA.BAT FILE WHICH CALLS THIS FILE. 

C PARAMETER DEFINITIONS GIVEN IN THE MARSTSA.BAT FILE 

INTEGER P, P1,P2,P3 
PARAMETER(MXP=100 ,MXN*10000) 

INTEGER LXl(MXP) ,LX2(MXP),LX3(MXP), LGl(MXP) ,LG2(MXP) ,LG3(MXP) 
REAL X(MXN.MXP) ,Y(MXN) ,W(MXN) ,X1(MXN) ,X2(MXN) ,X3(MXN) 

CHARACTER* 12 FN1,FN2,FN3, FOUT 

DATA P1,P2,P3, NXX,NX2,NX3 /O, 0,0, 0,0,0/ 

C 

* DATA INPUT LINE 1 PARAMETERS — NXX,P,MI,NK,NGC,NGS,M,ICX,MS,DF,MSC 

* LINE 2 — LX 

* LINE 3 — DESIRED LAG VARIABLES. ORDERED 

* REST OF FILE — TIME SERIES 

C 

FOUT* ’MARS30a.DAT’ 

OPEN(10,FILE*FOUT) 

C 

WRITE(6,*) ’UNDER IBM CMS FILE NAMES MUST BEGIN WITH A / ’ 
WRITEC6,*) ’ENTER 1ST TIME SERIES FILE NAME (IN QUOTES):’ 
READ(5,*) FNl 
0PENC7 ,FILE*FN1 ,ERR=999) 

CALL GETDATA(7, NXX, P1,MI,NK,NGC,NGS,M,ICX,MS,DF,LX1,LG1,X1, 

* MXN,MXP,MSC) 

C 

WRITE(6,*) ’ENTER 2ND TIME SERIES FILE NAME (IN QUOTES):’ 
WRITE(6,*) ’IF NOT APPLICABLE JUST TYPE ONE SPACE IN QUOTES’ 
READ(5,*) FN2 

IF(FN2.NE. ’/’ .AND. FN2.NE.’ ’) THEN 
0PEN(8,FILE=FN2,ERR*999) 
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CALL GETDATA(8, NX2, P2, I, I, I, I, I, I, I, A,LX2,LG2,X2, 

* MXN.MXP.I) 

ENDIF 

WRITEC6,*) ’ENTER 3RD TIME SERIES FILE NAME (IN QUOTES):’ 
WRITE(6,*) ’IF NOT APPLICABLE JUST TYPE ONE SPACE IN QUOTES’ 
READ (5,*) FN3 

IF(FN3.NE.’/’ .AND. FN3.NE.’ ’) THEN 
OPEN (9 , FILE=FN3 ,ERR=999) 

CALL GETDATA(9, NX3, P3, I, I, I, I, I, I, I, A,LX3,LG3,X3, 

* MXN.MXP.I) 

ENDIF 

ifCnxx.eq.O .or. nxx.ne.nx2 .and.nx2.gt .0 .or. 

* nxx.ne.nx3 .and.nx3.gt.O) then 
«rite(6,*) ’series are not of the same length’ ,nxx,nx2,nx3 
close(lO) 

stop 

endif 

C 

12=0 

13=0 

if(p2.gt.O) 12=lg2(p2) 
if(p3.gt.O) 13=lg3(p3) 

LP=MAX(LG1(P1), L2, L3) 

N=NXX-LP 

P=P1+P2+P3 

C — WEIGHTS W. RESPONSE Y BUILT FROM 1ST TIME SERIES 
DO 100 11=1, N 
W(II)=1. 

Y(II)=X1(II+LP) 

100 CONTINUE 
C 

C BUILD THE REGRESSION X MATRIX 

DO 101 11=1, N 
DO 102 JJ=1,P1 

X(II,JJ)=X1(II+LP-LG1(JJ)) 

102 CONTINUE 

101 CONTINUE 
C 

DO 103 11=1, N 
DO 104 JJ=1,P2 

X(II,JJ+P1)=X2(II+LP-LG2(JJ)) 

104 CONTINUE 

103 CONTINUE 
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c 

DO 105 11=1, N 
DO 106 JJ=1,P3 

X(II,JJ+P1+P2)=X3(II+LP-LG3(JJ)) 

106 CONTINUE 
105 CONTINUE 
C 

C --- BUILD MARSDRV INPUT FILE 

WRITE(10,114) N,P1,P2,P3,MI,NK,NGC,NGS,M,ICX,MS,DF,MSC 
WRITEdO.lll) (LX1(I),I=1,P1), (LX2(I),I=1.P2), (LX3(I) ,1=1 ,P3) 
WRITE(10,116) (LG1(I),I=1,P1), (LG2(I) ,I=1,P2) , (LG3(I) ,1=1 ,P3) 
WRITE(10,112) (W(I),I=1,N) 

DO 110 I» 1,N 

WRITE(10,112) (X(I,J),J=1,P), Y(I) 

110 CONTINUE 
RETURN 

999 CONTINUE 

WRITEC6,*) ’FILE NOT FOUND FOR THIS TIME SERIES’ 

STOP 

C 

111 F0RMAT(2OI3) 

112 FORMAT (14F10. 5) 

114 FORMATCl 115, F5. 1,15) 

116 F0RMAT(20I5) 

END 

C 

SUBROUTI NE GETDATA ( lU , NXX , P , MI , NK , NGC , NGS , M , I CX , MS , DF , LX , LAG , X , 

* MXN,MXP,MSC) 

INTEGER LX(MXP),LAG(MXP), IU,P 
REAL X(MXN) 

READ (IU,*,END= 100) NXX,P,MI,NK,NGC,NGS,M,ICX,MS,DF,MSC 
READ(IU,*,END=888) (LX(J) , J=1 ,P) 

READ(IU,*,END=888) (LAG(J) ,J*1 ,P) 

READ(IU,*,END=888) (X(I), 1=1, NXX) 

CLOSE (lU) 

DO 10 1=2, P 

IF(LAG(I).LE.LAG(I-1)) THEN 

HRITE(6,*) ’LAGS NOT IN ASCENDING ORDER OR DUPLICATE,UNIT=’ ,IU 
STOP 
ENDIF 

10 CONTINUE 
return 
100 continue 
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c empty file 

nxx*0 

p«0 

RETURN 

888 CONTINUE 

HRITE(6,*) ’FILE INCOMPLETE FOR THIS TIME SERIES’ 

STOP 

END 
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APPENDIX C. NDP FORTRAN PROGRAM FOR EXECUTING 

THE MARS 3,0 PROGRAM 



C This is the HARSDRVA.F FORTRAN file 01 JULY 1991 

C *******************4i4^**t*i^******^****t ****************** *********** 

C J. STEVENS - L. URIBE - P.A.W. LEWIS. e-mail 1526P€NAVPGS.BITNET 
C ******************************************************************n^ 

C DRIVER PROGRAM FOR RUNNING MARS 3.0. IT FIRST COMPUTES ALL THE ARRAY 
c SPACE ALLOCATIONS, AND THEN RUNS THE MARS REGRESSION, USING THE 
C INPUT PREPARED BY MARSBLDA.F WHICH IS CALLED BY MARSTSA.BAT. 

PARAMETER (NV= 20000) 

C SET UP WORKING STORAGE: 

REAL V(NV) 

INTEGER INTV(NV) 

CHARACTER*8 FIN 
EQUIVALENCE (V, INTV) 

C 

OPEN ( 10 , FILE* 'mars30a . DAT ’ ,ERR=999) 

OPEN (6 , FILE* ’mars30a . out ' ,ERR=999) 
c lu FIN*’MARS30A’ 

c lu 0PEN(10,FILE=FIN // ' .DAT' ,ERR=999) 
c lu 0PEN(6, FILE*FIN // '.OUT') 

WRITE(6,’(/,” DRIVER FOR MARS 3.0. ")') 

C 

C READ IN DATA: 

C 

READ(10,*,END*999) N,NP1,NP2,NP3,MI,NK,NGC,NGS,M,ICX,MS,DF,MSC 
NP*NP1+NP2+NP3 

WRITE(6,121) N,NP,NP1,NP2,NP3,MI,NK,NGC,NGS,M,ICX,MS,DF,MSC 



121 FORMATC/,' NO. OF OBSERVATIONS N: ’,16, 

* /,’ TOTAL NO. OF PREDICTORS P: ’,16, 

* /,’ NO. OF PREDICTORS/TIME SERIES ’,316, 

* /,’ MAX NO. OF INTERACTIONS MI: ’,16, 

* /,’ MAX NO OF BASIS FUNCTIONS NK: ’,16, 

* /,’ NO. OF RASTER POINTS FOR PLOTTING NGC:’,I6, 

* /,’ NO. OF R.P. FOR SURFACE ESTIMATES NGS:’,I6, 

* /,’ MODEL FLAG (1=LINEAR, 2*CUBIC) M:’,I6, 

* /,’ CONVEX HULL FLAG FOR PLOTS ICX’,16, 
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/,* MINIMUM SPAN: 

/,» DEGREES OF FREEDOM DF: 

/,» MODE SELECTION CRITERIA: 

/,» (1=GCV, 2=AIC, 3=PC, 4=SC) 




C 



IF(NP.LE.O) RETURN 

IPLX»1 

IPX=IPLX+NP 

IPY=IPX+N*NP 

IPW=IPY+N 

IPLAG=IPW+N 

IPIM*IPLAG+NP 

IPSP=IPIM + 21+NK*(3*MI+8) 

C — SP ALLOCATED FOR MARS OR PLOT, WHICHEVER LARGEST 

LSP»N* (MAX (NK , 2) +3) +MAX (3*N+5*NK+NP , 2*NP ,4*N) +2*NP+4*NK 
LSP*MAX(LSP, 4*NGS*NGS, NGC, 2*N) 

IPMM=IPSP + LSP 
IFCIPMM .GT. NV) THEN 

HRITE(6,*) ’***♦ MEMORY REQUIREMENTS EXCEEDED FOR X **♦*♦*' 
WRITE(6,*) »***♦ MEMORY REQUESTED, AVAILABLE=> ,IPMM,NV 



C — - READ LX, W AND LAGS 

CALL READLXW(INTV(IPLX),V(IPW), N,NP, INTV(IPLAG) ) 

C READ X AND Y 

CALL READXY(V(IPX),V(IPY),N,NP) 

C ™ COMPUTE NMCV,NTCV FROM THE DATA X 

CALL COMPCV(V(IPX),N,NP, INTV(IPLX), NMCV,NTCV, INTV(IPIM) ) 



LMM=MAX(N*NP+2*MAX(MI,NMCV), 2*(MI+1), NMCV) 

IPFM*IPMM + LMM 

IPDP=IPFM + 3+NK*(5*MI+NMCV+6)+2*NP+NTCV 
IPDP*FL0AT(IPDP)/8. + 1. 

IPDP*8*IPDP 

IPEND=IPDP+8+2*MAX(N*NK , (NK+1) * (NK+1) ) +MAX( (NK+2) * (NMCV+3) ,4*NK) 
C* WRITEC6,*) 'IP=LX,X,Y,H,IM,SP,MM,FM,DP,END=» , 

C* ♦ IPLX,IPX,IPY,IPW,IPIM,IPSP,IPMM,IPFM,IPDP,IPEND 

IFCIPEND .GT. NV) THEN 

HRITE(6,f) MEMORY REQUIREMENTS EXCEEDED *♦*****♦> 

WRITE(6,*) MEMORY REQUESTED, AVAILABLE=MPEND,NV 

STOP 



STOP 



ENDIF 



C 



C 



ENDIF 
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c 



WRITE(6,*) ’LAGS AND LX FOR TIME SERIES 1’ 

WRITE(6,122) (INTVCIPLAG-I+I) ,I«1,NP1) 

WRITE(6,122) (INTVCIPLX -l+I) ,I=1,NP1) 

WRITE(6,*) ’LAGS AND LX FOR TIME SERIES 2’ 

WRITE(6,122) (INTV(IPLAG+NP1-1+I),I*1,NP2) 

WRITE(6,122) (INTVCIPLX +NP1-1+I) ,I=1,NP2) 

WRITE (6,*) ’LAGS AND LX FOR TIME SERIES 3’ 

WRITEC6 . 122) (INTV(IPLAG+NP1+NP2-1+I) ,1*1 ,NP3) 

WRITE(6,122) (INTVCIPLX +NP1+NP2-1+I) ,1*1 ,NP3) 

122 FORMAT (20IS) 

WRITEC6,*) ’START MARS. MEMORY NEEDED / AVA ILABLE*’ , IPEND,NV 

WRITEC6,*) ’ ’ 

C 

CALL SETMS(MS) 

CALL SETDF(DF) 

CALL MARS(N,NP,V(IPX),V(IPY),V(IPW),NK,MI,V(IPLX),V(IPFM), 

* V(IPIM),V(IPSP),V(IPDP),V(IPMM), MSC) 

C 

CALL DISPFM(V(IPFM+1),V(IPDP),NK,MI, V(IPFM) ) 

C 

C CONSTRUCT PLOTS FOR INTERPRETING RESULTING MODEL: 

C 

IFCNGC.EQ.O .AND. NGS.EQ.O) RETURN 
C 

IPCRV*IPDP 

IPSRF-IPCRV + 2*NGC*NK 
IPEND*IPSRF + NGS*NGS*NK 
IFCIPEND .GT. NV) THEN 

WRITEC6,*) ’♦♦♦♦ MEMORY REQUIREMENTS EXCEEDED FOR PLOT ***♦♦’ 
WRITEC6,*) ’♦♦♦♦ MEMORY REQUESTED, AVAILABLE*’ ,IPEND,NV 
STOP 
ENDIF 

CALL PLOT (M,V(IPX),V(IPFM),V(IPIM),NGC,NGS,ICX, 

* NC,V(IPCRV),NS,V(IPSRF),V(IPSP),V(IPMM) ) 

C WRITE PLOTS TO OUTPUT FILES FOR PLOTTING WITH LOCAL GRAPHICS PACKAGE; 
C 

WRITE(6,*) ’PLOT *IPEND,NC,NS*’ ,IPEND,NC,NS 
CALL WPLOT(V(IPCRV),V(IPSRF),NGC,NC, NGS,NS, FIN) 

RETURN 

999 CONTINUE 

WRITE(6,*) ’ERROR OPENING INPUT FILE:’, FIN 
END 
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c 

SUBROUTINE READXY(X,Y,N ,NP) 

REAL X(N,NP), Y(NP) 

DO 1 1=1, N 

READClO,*) (X(I,J),J=1,NP),Y(I) 

1 CONTINUE 
END 
C 

SUBROUTINE COMPCV(X,N,NP, LX, NMCV,NTCV, WK) 

REAL X(N,NP) 

INTEGER LX(NP) 

INTEGER WK(N) 

NMCV=0 

NTCV=0 

C FIND MAX OVER ALL COLUMNS AND SUM OF ALL COLUMN VALUES 
DO 30 J=1,NP 
IF(LX(J).LT.O) THEN 

NCAT=NUMCAT(X(1,J),N, HK) 

ELSE 

NCAT=0 

ENDIF 

NTCV=NTCV+NCAT 
NMCV=MAX(NMCV, NCAT) 

30 CONTINUE 
END 
C 

FUNCTION NUMCAT(V,N, HK) 

C — FIND NUMBER OF DISTINCT VALUES IN V 
REAL V(N) 

INTEGER WK(N) 

C INITIALIZE DUPLIC. CONTROL MATRIX HK TO 0 
DO 10 1=1, N 
HK(I)=0 
10 CONTINUE 
NUMCAT=0 
DO 20 1=1, N 
VAL=V(I) 

IF(HK(I).EQ.O) THEN 
11 = 1+1 

DO 30 J=I1,N 
IF(VAL.EQ.V(J)) HK(J)=1 
30 CONTINUE 

NUMCAT=NUMCAT+1 
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ENDIF 

20 CONTINUE 
END 
C 

SUBROUTINE READLXW(LX,W, N,NP,LAG) 

C READ VALUES FOR LX AND W 
INTEGER LX(NP),LAG(NP) 

REAL W(N) 

READCIO,*) LX 
READClO,*) LAG 
READCIO,*) W 
END 
C 

SUBROUTINE WPLOT(CRV,SRF,NGC,NC, NGS.NS, FNAME) 

REAL CRV(NGC,2,NC), SRF(NGS, NGS.NS) 

CHARACTER*8 FNAME 
IF(NC.GT.O) THEN 

0PEN(11,FILE»FNAME // ' .CUR' .FORM* ’UNFORMATTED ' ) 

WRITE (11) NGC.NC.CRV 

ENDIF 

IF(NS.GT.O) THEN 

0PEN(12.FILE=FNAME // ’ .SUR’ .FORM* 'UNFORMATTED ’ ) 

WRITE (12) NGS.NS.SRF 

ENDIF 

END 

C 

SUBROUTINE DISPFM ( FM . MMM . NK . MI . CONST) 

C ANALYSIS OF FM FOR MARS 3.0 

REAL FM(S.NK).MMM(NK.2.MI+1). VMIN(IOO) .MMDl(lOO) .MMD2(100) 
INTEGER OUT(IOO) 

♦ WRITE OUT THE MODEL CONSTANT* 

WRITE(6.*) ’ ’ 

WRITE(6.112) 'MODEL CONSTANT * ’.CONST 
WRITE(6.*) ’ ’ 

* INITIALIZE MMM 

DO 18 1*1. NK 
DO 17 J*1.2 
DO 16 K*1.MI+1 
MMM(I.J.K)*100. 

16 CONTINUE 

17 CONTINUE 

18 CONTINUE 
C 
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IC1=0 
DO 20 

IF(FH(1,I) .EQ.O) GOTO 20 

IC1=IC1+1 

IC2»2 

MMM(IC1,2,1)=FM(1,I) 

MHH(IC1,1,IC2)=FM(2,I) 

MMM(IC1,2,IC2)=FM(3,I) 

DUM*FM(4,I) 

19 IF(DUM.EQ.O.O) GOTO 21 
IC2=IC2+1 

HMM(IC1,1,IC2)=FM(2,DUM) 

MMH(IC1,2,IC2)=FM(3,DUM) 

DUM=FM(4,DUM) 

GOTO 19 

21 MMH(IC1,1»1)=IC2-1 

20 CONTINUE 
C 

DO 50 J=1,IC1 

KEND»MMM(J,1,1) 

DO 49 K=1,KEND 
HMD1(K)=ABS(MMM(J,1,K+1)) 
VMIN(K)=MMM(J,1,K+1) 
MMD2(K)=MMM(J,2,K+1) 

OUT(K)=K 

49 CONTINUE 

CALL PS0RT(HMD1,0UT,1,KEND) 

DO 48 K=1,KEND 
MMM(J,1»K+1)=VHIN(0UT(K)) 
MMH(J,2,K+1)*HMD2(0UT(K)) 

48 CONTINUE 

50 CONTINUE 
C 

C PRINT INTERACTION VARIABLE REPORT 

DO 100 1=1, MI 

WRITE(6,*) ************** » 
WRITE(6,*) ’INTERACTION LEVEL:’, I 
WRITE(6,*) ’************* ’ 

101 CONTINUE 

DO 102 J=1,MI 
VMIN(J) =999999. 

102 CONTINUE 

DO 110 J=1,IC1 
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KEND=MMM(J,1,1)+1 
IF(KEND-1 .NE. I) GO TO 110 
DO 120 K=2, KEND 

IF(ABS(MMM(J,1,K)).LT.VMIN(K-1)) THEN 
J1*J 

DO 130 KK=2, KEND 
VMIN(KK-1)*ABS(MMM(J,1.KK)) 

130 CONTINUE 

GO TO 110 

ELSE 

IF(ABS(MMM(J,1,K)).GT.VMIN(K-1)) GO TO 110 
ENDIF 

120 CONTINUE 

110 CONTINUE 
C 

IF (VMIN(l).NE. 999999.) THEN 

WRITE(6,111) * VARIABLES * , (MMM(J1, 1 ,KK) ,KK=2,I+1) 

WRITE(6,112) 'COEFF AND KNOTS ’ , (MMM(J1,2,KK) ,KK=1,I+1) 

WRITE(6,*) * * 

MMM(J1,1,1)=MMM(J1,1,1) + MI 
GO TO 101 

ENDIF 

100 CONTINUE 
RETURN 

111 F0RMAT(A25,15X,7F10.3) 

112 F0RMAT(A25,F15.6,7F10.3) 

END 
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APPENDIX D. MARS OUTPUT FOR ASTAR MODEL GRANITE2 



DRIVER FOR MARS 3.X. 

NO. OF OBSERVATIONS N: 

TOTAL NO. OF PREDICTORS P: 

NO. OF PREDICTORS/TIME SERIES 
MAX NO. OF INTERACTIONS MI: 

MAX NO OF BASIS FUNCTIONS NK: 

NO. OF RASTER POINTS FOR PLOTTING NGC: 
NO. OF R.P. FOR SURFACE ESTIMATES NGS: 
MODEL FLAG (l=LINEAR, 2=CUBIC) M: 

CONVEX HULL FLAG FOR PLOTS ICX 

MINIMUM SPAN: MS 

DEGREES OF FREEDOM DF: 



MODEL SELECTION 


CRITERIA (MSC): 






(1=GCV, 2=AIC, 


3=PC 


. 4*SC) 






LAGS 


AND LX 


FOR 


TIME SERIES 1 






1 


2 


3 


4 


5 6 


7 


8 


16 


17 


18 


19 


20 21 


22 


23 


31 


32 


33 


34 


35 36 


37 


38 


46 


47 


48 


49 


50 






1 


1 


1 


1 


1 1 


1 


1 


1 


1 


1 


1 


1 1 


1 


1 


1 


1 


1 


1 


1 1 


1 


1 


1 


1 


1 


1 


1 






LAGS 


AND LX 


FOR 


TIME SERIES 2 






0 














3 














LAGS 


AND LX 


FOR 


TIME 


SERIES 3 







0 



3 

START MARS. MEMORY NEEDED/AVAILABLE* 



4330 

52 

50 1 1 

3 

60 

0 

0 

0 

0 

50 

3.0 

GCV 



9 


10 


11 


12 


13 


14 


15 


24 


25 


26 


27 


28 


29 


30 


39 


40 


41 


42 


43 


44 


45 


1 


1 


1 


1 


1 


1 


1 


1 


1 


1 


1 


1 


1 


1 


1 


1 


1 


1 


1 


1 


1 



1150368 1200000 



MARS MODELING, VERSION 3.5a (6/16/91) 



INPUT PARAMETERS (SEE DOC . ) : 
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N 


P 


NK 


MS 


MI 




DF 


IL 


FV 




IC 


4330 


52 


60 


50 


3 




3.000 


0 


0.000 




0 


PREDICTOR VARIABLE FLAGS: 
















VAR: 


1 2 


3 


4 5 6 


7 


8 


9 10 


11 12 


13 14 


15 


16 17 18 


21 22 


23 24 


25 


26 27 28 


29 


30 


31 32 


33 34 


35 36 


37 


38 39 40 


41 42 


43 44 


45 


46 47 48 


49 


50 


51 52 










FLAG: 


1 1 


1 


1 1 1 


1 


1 


1 1 


1 1 


1 1 


1 


111 


1 1 


1 1 


1 


1 1 1 


1 


1 


1 1 


1 1 


1 1 


1 


111 


1 1 


1 1 


1 


1 1 1 


1 


1 


3 3 










ORDERABLE RESPONSE: 
















MIN 




N/4 




N/2 


3N/4 




MAX 


8.000 


10.60 


11.70 


12.90 


17.00 


THERE - 


ARE 52 


ORDERABLE PREDICTOR VARIABLES 








VAR 


MIN 




N/4 






N/2 




3N/4 




MAX 


1 


8.000 




10.60 






11.70 




12.90 




17.00 


2 


8.000 




10.60 






11.70 




12.90 




17.00 


3 


8.000 




10.60 






11.70 




12.90 




17.00 


4 


8.000 




10.60 






11.70 




12.90 




17.00 


5 


8.000 




10.50 






11.70 




12.90 




17.00 


6 


8.000 




10.50 






11.70 




12.90 




17.00 


7 


8.000 




10.50 






11.70 




12.90 




17.00 


8 


8.000 




10.50 






11.70 




12.90 




17.00 


9 


8.000 




10.50 






11.70 




12.90 




17.00 


10 


8.000 




10.50 






11.70 




12.90 




17.00 


11 


8.000 




10.50 






11.70 




12.90 




17.00 


12 


8.000 




10.50 






11.70 




12.90 




17.00 


13 


8.000 




10.50 






11.60 




12.90 




17.00 


14 


8.000 




10.50 






11.60 




12.90 




17.00 


15 


8.000 




10.50 






11.60 




12.90 




17.00 


16 


8.000 




10.50 






11.60 




12.90 




17.00 


17 


8.000 




10.50 






11.60 




12.90 




17.00 


18 


8.000 




10.50 






11.60 




12.90 




17.00 


19 


8.000 




10.50 






11.60 




12.90 




17.00 


20 


8.000 




10.50 






11.60 




12.90 




17.00 


21 


8.000 




10.50 






11.60 




12.90 




17.00 


22 


8.000 




10.50 






11.60 




12.90 




17.00 


23 


8.000 




10.50 






11.60 




12.90 




17.00 


24 


8.000 




10.50 






11.60 




12.90 




17.00 


25 


8.000 




10.50 






11.60 




12.90 




17.00 
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26 


8.000 


10.50 


11.60 


12.90 


17.00 


27 


8.000 


10.50 


11.60 


12.90 


17.00 


28 


8.000 


10.50 


11.60 


12.90 


17.00 


29 


8.000 


10.50 


11.60 


12.90 


17.00 


30 


8.000 


10.50 


11.60 


12.90 


17.00 


31 


8.000 


10.50 


11.60 


12.90 


17.00 


32 


8.000 


10.50 


11.60 


12.90 


17.00 


33 


8.000 


10.50 


11.60 


12.90 


17.00 


34 


8.000 


10.50 


11.60 


12.90 


17.00 


35 


8.000 


10.50 


11.60 


12.80 


17.00 


36 


8.000 


10.50 


11.60 


12.80 


17.00 


37 


8.000 


10.50 


11.60 


12.80 


17.00 


38 


8.000 


10.50 


11.60 


12.80 


17.00 


39 


8.000 


10.50 


11.60 


12.80 


17.00 


40 


8.000 


10.50 


11.60 


12.80 


17.00 


41 


8.000 


10.50 


11.60 


12.80 


17.00 


42 


8.000 


10.50 


11.60 


12.80 


17.00 


43 


8.000 


10.50 


11.60 


12.80 


17.00 


44 


8.000 


10.50 


11.60 


12.80 


17.00 


45 


8.000 


10.50 


11.60 


12.80 


17.00 


46 


8.000 


10.50 


11.60 


12.80 


17.00 


47 


8.000 


10.50 


11.60 


12.80 


17.00 


48 


8.000 


10.50 


11.60 


12.80 


17.00 


49 


8.000 


10.50 


11.60 


12.80 


17.00 


50 


8.000 


10.50 


11.60 


12.80 


17.00 


51 


-1.000 


-0.7000 


O.OOOOE+00 


0.7000 


1.000 


52 


-1.000 


-0.7000 


O.OOOOE+00 


0.7000 


1.000 



FORWARD STEPWISE KNOT 


PLACEMENT: 










BASFN(S) 

0 


MSC 

2.6000 


fINDBSFNS 

0.0 


tEFPRMS 

1.0 


VARIABLE 


KNOT 


PARENT 


2 1 


0.3031 


2.0 


5.9 


1. 


15.40 


0. 


3 


0.2983 


3.0 


9.8 


14. 


8.000 


0. 


4 


0.2972 


4.0 


13.8 


2. 


8.000 


0. 


6 5 


0.2959 


6.0 


18.7 


19. 


9.100 


4. 


7 


0.2944 


7.0 


22.6 


51. 


-1.000 


0. 


9 8 


0.2928 


9.0 


27.5 


35. 


10.00 


5. 


11 10 


0.2923 


11.0 


32.5 


17. 


13.40 


4. 


13 12 


0.2918 


13.0 


37.4 


2. 


14.90 


2. 


15 14 


0.2915 


15.0 


42.3 


3. 


14.80 


3. 


16 


0.2905 


16.0 


46.2 


2. 


8.000 


14. 


18 17 


0.2901 


18.0 


51.2 


2. 


14.80 


3. 
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19 


0.2892 


19.0 


55.1 


36. 


8.000 


17. 


21 


20 


0.2886 


21.0 


60.0 


1. 


14.90 


11. 


23 


22 


0.2883 


23.0 


64.9 


7. 


11.80 


11. 


25 


24 


0.2880 


25.0 


69.8 


1. 


13.10 


18. 


27 


26 


0.2878 


27.0 


74.8 


31. 


13.40 


10. 


28 


0.2874 


28.0 


78.7 


26. 


8.000 


17. 


30 


29 


0.2873 


30.0 


83.6 


39. 


15.00 


18. 


32 


31 


0.2871 


32.0 


88.5 


36. 


12.40 


4. 


34 


33 


0.2864 


34.0 


93.5 


1. 


14.30 


CM 

CO 


36 


35 


0.2862 


36.0 


98.4 


3. 


13.60 


4. 


38 


37 


0.2861 


38.0 


123.3 


47. 


11.80 


31. 


40 


39 


0.2860 


40.0 


138.2 


35. 


13.30 


36. 


42 


41 


0.2858 


42.0 


133.2 


5. 


12.80 


31. 


44 


43 


0.2857 


44.0 


148.1 


1. 


10.90 


31. 


46 


45 


0.2855 


46.0 


123.0 


29. 


10.10 


15. 


48 


47 


0.2852 


48.0 


137.9 


20. 


9.500 


0. 


50 


49 


0.2852 


50.0 


132.8 


45. 


13.00 


35. 


52 


51 


0.2851 


52.0 


147.8 


44. 


14.90 


10. 


54 


53 


0.2851 


54.0 


122.7 


30. 


10.20 


32. 


56 


55 


0.2849 


56.0 


137.6 


35. 


15.40 


3. 


58 


57 


0.2849 


58.0 


132.5 


15. 


15.00 


13. 


60 


59 


0.2848 


60.0 


147.5 


25. 


9.500 


56. 



FINAL MODEL AFTER BACKWARD STEPWISE ELIMINATION: 



BSFN: 

COEF: 

BSFN: 

COEF: 

BSFN: 

COEF: 

BSFN: 

COEF: 

BSFN: 

COEF: 

BSFN: 

COEF: 



0 


1 


15.778 


1.2432 


6 


7 


-0.1587 


-0.1026 


12 


13 


0.3677E+00 


O.OOOOE+00 


18 


19 


O.OOOOE+00 


0.5120E-01 


24 


25 


-0.1822E-01 


O.OOOOE+00 


30 


31 


O.OOOOE+00 


O.OOOOE+00 



2 


3 


-1.0419E+00 


O.OOOOE+00 


8 


9 


0.4741E-02 


-0.2700E-01 


14 


15 


-0.3553E+00 


O.OOOOE+00 


20 


21 


0.5373E-01 


0.5596E-02 


26 


27 


0.1478E-01 


0.1307E-01 


32 


33 


-0.2142E-01 


-0.5539E-01 



4 5 

0.7511E-01 -0.1441E-01 

10 11 

0.2597E-01 -0.1784E-01 

16 17 

0.4409E-01 -0.2146E+00 

22 23 

0.7694E-02 -0.5888E-02 

28 29 

0.3072E-01 0.4366E-01 

34 35 

O.OOOOE+00 O.OOOOE+00 
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BSFN: 


36 


37 


38 


39 


40 


41 


COEF: 


O.OOOOE+00 


O.OOOOE+00 


-0.2932E-01 


0.2563E-01 


O.OOOOE+00 


0 . OOOOE+00 


BSFN: 


42 


43 


44 


45 


46 


47 


COEF: 


-0.2426E-01 


-0.3663E-02 


0.7473E-01 


-0.2695E-02 


-0.1881E-01 


0.5118E-01 


BSFN: 


48 


49 


50 


51 


52 


53 


COEF: 


O.OOOOE+00 


0.1456E-01 


0.1286E-01 


-0.4244E-01 


O.OOOOE+00 


0.5698E-02 


BSFN: 


54 


55 


56 


57 


58 


59 


COEF: 


0.1640E-01 


-0.4912E-01 


0.1513E-01 


0.5708E-01 


0.1159E-02 


O.OOOOE+00 



BSFN: 60 

COEF; 0.1651E-01 

(PIECEWISE LINEAR) MSC = 0.2808 #EFPRMS = 115.7 



ANOVA DECOMPOSITION ON 44 BASIS FUNCTIONS: 



FUN. 


STD. DEV. 


-MSC 


#BSFNS 


#EFPRMS 


VARIABLE(S) 


1 


1.682 


0.4456 


2 


5.8 


1 






2 


0.1211 


0.2812 


1 


2.6 


2 






3 


0.7252E-01 


0.2846 


1 


2.6 


51 






4 


0.7980E-01 


0.2814 


1 


2.6 


20 






5 


0.1451 


0.2813 


2 


5.2 


2 


19 




6 


0.7697E-01 


0.2817 


2 


2.9 


2 


17 




7 


0.4403E-01 


0.2815 


1 


5.8 


1 


2 




8 


0.3416 


0.2820 


1 


2.9 


3 


14 




9 


0.1995 


0.2813 


1 


2.9 


2 


14 




10 


0.7316E-01 


0.2819 


1 


5.8 


2 


36 




11 


0.9518E-01 


0.2828 


2 


2.9 


14 


35 




12 


0.2111 


0.2825 


2 


2.9 


2 


19 


35 


13 


0 . 3274 


0.2817 


1 


5.8 


2 


3 


14 


14 


0.2728 


0.2822 


1 


2.9 


2 


14 


36 


15 


0.8892E-01 


0.2816 


2 


2.9 


1 


2 


17 


16 


0.4887E-01 


0.2814 


2 


5.8 


2 


7 


17 


17 


0.3807E-01 


0.2814 


1 


2.9 


1 


2 


14 


18 


0.5182E-01 


0.2810 


2 


2.9 


2 


17 


31 


19 


0.1540 


0.2813 


1 


5.8 


2 


14 


26 


20 


0.3989E-01 


0.2819 


1 


2.9 


2 


14 


39 


21 


0.7842E-01 


0.2822 


3 


2.9 


1 


2 


36 


22 


0.2995E-01 


0.2813 


1 


5.8 


2 


36 


47 


23 


0.4398E-01 


0.2816 


1 


2.9 


2 


3 


35 


24 


0.4913E-01 


0.2818 


1 


2.9 


2 


5 


36 
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25 


0.6054E-01 


0.2822 


2 


5.8 


3 


14 


29 


26 


0.6108E-01 


0.2816 


2 


2.9 


2 


3 


45 


27 


0.3383E-01 


0.2814 


1 


2.9 


2 


17 


44 


28 


0.4288E-01 


0.2809 


2 


5.8 


2 


30 


36 


29 


0.6748E-01 


0.2813 


2 


2.9 


1 


2 


15 


30 


0.2573E-01 


0.2810 


1 


2.9 


14 


25 


35 



PIECEWISE CUBIC FIT ON 44 BASIS FUNCTIONS, HSC = .2867 



-MSC FOR REMOVING EACH VARIABLE: 



0.5545 


0.2981 


0.2838 


0.2808 


0.2818 


0.2808 


0.2814 


0.2808 


0.2808 


0.2808 


0.2808 


0.2808 


0.2808 


0.2876 


0.2813 


0.2808 


0.2832 


0.2808 


0.2822 


0.2814 


0.2808 


0.2808 


0.2808 


0.2808 


0.2810 


0.2813 


0.2808 


0.2808 


0.2823 


0.2809 


0.2810 


0.2808 


0.2808 


0.2808 


0.2837 


0.2868 


0.2808 


0.2808 


0.2819 


0.2808 


0.2808 


0.2808 


0.2808 


0.2814 


0.2816 


0.2808 0.2813 

0.2846 0.2808 

RELATIVE VARIABLE IMPORTANCE 


0.2808 


0.2808 


0.2808 


100.0 


25.11 


10.40 


O.OOOOE+00 


6.115 


O.OOOOE+00 


4.666 


O.OOOOE+00 


O.OOOOE+00 


O.OOOOE+00 


0 . OOOOE+00 


O.OOOOE+00 


O.OOOOE+00 


15.77 


4.220 


O.OOOOE+00 


9.337 


O.OOOOE+00 


7.103 


4.619 


O.OOOOE+00 


O.OOOOE+00 


O.OOOOE+00 


O.OOOOE+00 


2.542 


4.061 


O.OOOOE+00 


O.OOOOE+00 


7.224 


1.871 


2.629 


O.OOOOE+00 


O.OOOOE+00 


O.OOOOE+00 


10.26 


14.83 


O.OOOOE+00 


O.OOOOE+00 


6.397 


O.OOOOE+00 


O.OOOOE+00 


O.OOOOE+00 


O.OOOOE+00 


4.539 


5.375 


O.OOOOE+00 

11.75 


4.024 

O.OOOOe+00 


O.OOOOE+00 


O.OOOOE+00 


O.OOOOE+00 



mars output model 

THE VARIABLE SIGN INDICATES A LEFT (- SIGN) OR RIGHT ( SIGN) 
TRUNCATED SPLINE FUNCTION WITH THE INDICATED KNOT 

MODEL CONSTANT = 15.778527 



************* 
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INTERACTION LEVEL: 



1 






VARIABLES 
COEFF AND KNOTS 


-1.041947 


-1.000 

15.400 




VARIABLES 
COEFF AND KNOTS 


1.124321 


1.000 

15.400 




VARIABLES 
COEFF AND KNOTS 


-0.075109 


2.000 

8.000 




VARIABLES 
COEFF AND KNOTS 


0.051177 


20.000 

9.500 




VARIABLES 
COEFF AND KNOTS 


-0.102553 


51.000 

-1.000 




******* Id*** Id :|i 

INTERACTION LEVEL: 
************* 

VARIABLES 
COEFF AND KNOTS 


2 

0.367790 


-1.000 

15.400 


2.000 

14.900 


VARIABLES 
COEFF AND KNOTS 


-0.214606 


2.000 

14.800 


14.000 

8.000 


VARIABLES 
COEFF AND KNOTS 


-0.017842 


2.000 

8.000 


-17.000 

13.400 


VARIABLES 
COEFF AND KNOTS 


-0.025969 


2.000 

8.000 


17.000 

13.400 


VARIABLES 
COEFF AND KNOTS 


-0.158722 


2.000 

8.000 


-19.000 

9.100 


VARIABLES 
COEFF AND KNOTS 


-0.014415 


2.000 

8.000 


19.000 

9.100 


VARIABLES 
COEFF AND KNOTS 


-0.021417 


2.000 

8.000 


-36.000 

12.400 


VARIABLES 
COEFF AND KNOTS 


-0.355389 


3.000 

14.800 


14.000 

8.000 
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VARIABLES 
COEFF AND KNOTS 


0.015131 


14.000 

8.000 


-35.000 

15.400 




VARIABLES 
COEFF AND KNOTS 


-0.049118 


14.000 

8.000 


35.000 

15.400 




««**«***«**«« 
INTERACTION LEVEL: 

jtt************ 

VARIABLES 
COEFF AND KNOTS 


3 

-0.018219 


1.000 

13.100 


-2.000 

14.800 


14.000 

8.000 


VARIABLES 
COEFF AND KNOTS 


0.001159 


-1.000 

15.400 


-2.000 

14.900 


-15.000 

15.000 


VARIABLES 
COEFF AND KNOTS 


0.057077 


-1.000 

15.400 


-2.000 

14.900 


15.000 

15.000 


VARIABLES 
COEFF AND KNOTS 


0.005596 


-1.000 

14.900 


2.000 

8.000 


-17.000 

13.400 


VARIABLES 
COEFF AND KNOTS 


0.053733 


1.000 

14.900 


2.000 

8.000 


17 . 000 
13.400 


VARIABLES 
COEFF AND KNOTS 


0.074733 


-1.000 

10.900 


2.000 

8.000 


36.000 

12.400 


VARIABLES 
COEFF AND KNOTS 


-0.003663 


1.000 

10.900 


2.000 

8.000 


36.000 

12.400 


VARIABLES 
COEFF AND KNOTS 


-0.055390 


1.000 

14.300 


2.000 

8.000 


-36.000 

12.400 


VARIABLES 
COEFF AND KNOTS 


0.044091 


2.000 

8.000 


3.000 

14.800 


14.000 

8.000 


VARIABLES 
COEFF AND KNOTS 


0.025631 


2.000 

8.000 


-3.000 

13.600 


35.000 

13.300 


VARIABLES 
COEFF AND KNOTS 


0.012861 


2.000 

8.000 


3.000 

13.600 


-45.000 

13.000 
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VARIABLES 




2.000 


3.000 


45.000 


COEFF AND KNOTS 


0.014562 


8.000 


13.600 


13.000 


VARIABLES 




2.000 


-5.000 


36.000 


COEFF AND KNOTS 


-0.024261 


8.000 


12.800 


12.400 


VARIABLES 




2.000 


-7.000 


-17.000 


COEFF AND KNOTS 


-0.005888 


8.000 


11.800 


13.400 


VARIABLES 




2.000 


7.000 


-17.000 


COEFF AND KNOTS 


-0.007694 


8.000 


11.800 


13.400 


VARIABLES 




2.000 


14.000 


26.000 


COEFF AND KNOTS 


-0.030715 


8.000 


8 000 


8.000 


VARIABLES 




2.000 


14.000 


36.000 


COEFF AND KNOTS 


0.051202 


8.000 


8.000 


8.000 


VARIABLES 




2.000 


14.000 


39.000 


COEFF AND KNOTS 


0.043657 


8.000 


8.000 


15.000 


VARIABLES 




2.000 


17.000 


-31.000 


COEFF AND KNOTS 


0.013065 


8.000 


13.400 


13.400 


VARIABLES 




2.000 


17.000 


31.000 


COEFF AND KNOTS 


0.014785 


8.000 


13.400 


13.400 


VARIABLES 




2.000 


17.000 


44.000 


COEFF AND KNOTS 


-0.042443 


8.000 


13.400 


14.900 


VARIABLES 




2.000 


19.000 


-35.000 


COEFF AND KNOTS 


-0.027002 


8.000 


9.100 


10.000 


VARIABLES 




2.000 


19.000 


35.000 


COEFF AND KNOTS 


0.004741 


8.000 


9.100 


10.000 


VARIABLES 




2.000 


-30.000 


-36.000 


COEFF AND KNOTS 


0.016398 


8.000 


10.200 


12.400 


VARIABLES 




2.000 


30.000 


-36.000 


COEFF AND KNOTS 


0.005698 


8.000 


10.200 


12.400 


VARIABLES 




2.000 


36.000 


-47.000 
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COEFF 


AND KNOTS 


-0.029326 


8.000 


12.400 


11.800 




VARIABLES 




-3.000 


14.000 


-29.000 


COEFF 


AND KNOTS 


-0.018812 


14.800 


8.000 


10.100 




VARIABLES 




-3.000 


14.000 


29.000 


COEFF 


AND KNOTS 


-0.002695 


14.800 


8.000 


10.100 




VARIABLES 




14.000 


-25.000 


-35.000 


COEFF 


AND KNOTS 


0.016511 


8.000 


9.500 


15.400 
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